[RFA,tree-optimization/90883] Improve DSE to handle redundant calls

Message ID ab6171bd-694f-f18a-633e-39bdbdf7ba2c@redhat.com
State New
Headers show
Series
  • [RFA,tree-optimization/90883] Improve DSE to handle redundant calls
Related show

Commit Message

Jeff Law June 26, 2019, 4:16 a.m.
So based on the conversation in the BZ I cobbled together a patch to
extend tree-ssa-dse.c to also detect redundant stores.

To be clear, given two stores, the first store is dead if the later
store overwrites all the live bytes set by the first store.   In this
case we delete the first store.  If the first store is partially dead we
may trim it.

Given two stores, the second store is redundant if it stores _the same
value_ into locations set by the first store.  In this case we delete
the second store.


We prefer to remove redundant stores over removing dead or trimming
partially dead stores.

First, if we detect a redundant store, we can always remove it.  We may
not always be able to trim a partially dead store.  So removing the
redundant store wins in this case.

But even if the redundant store occurs at the head or tail of the prior
store, removing the redundant store is better than trimming the
partially dead store because we end up with fewer calls to memset with
the same number of total bytes written.

We only look for redundant stores in a few cases.  The first store must
be a memset, empty constructor or calloc call -- ie things which
initialize multiple memory locations to zero.  Subsequent stores can
occur via memset, empty constructors or simple memory assignments.

The chagne to tree-ssa-alias.c deserves a quick note.

When we're trying to determine if we have a redundant store, we create
an AO_REF for the *second* store, then ask the alias system if the first
store would kill the AO_REF.

So while normally a calloc wouldn't ever kill anything in normal
execution order, we're not asking about things in execution order.  We
really just want to know if the calloc is going to write into the
entirety of the AO_REF of the subsequent store.  So we compute the size
of the allocation and we know the destination from the LHS of the calloc
call and everything "just works".

This patch also includes a hunk I apparently left out from yesterday's
submission which just adds _CHK cases to all the existing BUILT_IN_MEM*
cases.  That's what I get for writing this part first, then adding the
_CHK stuff afterwards, then reversing the order of submission.

This includes a slightly reduced testcase from the BZ in g++.dg -- it's
actually a good way to capture when one empty constructor causes another
empty constructor to be redundant.  The gcc.dg cases capture other
scenarios.

This has been bootstrapped and regression tested on x86-64, i686, ppc64,
ppc64le, sparc64 & aarch64.  It's also bootstrapped on various arm
targets, alpha, m68k, mips, riscv64, sh4.  It's been built and tested on
a variety of *-elf targets as well as various *-linux-gnu targets as
crosses.  ANd just for giggles it was tested before the changes to add
the _CHK support, so it works with and without that as well.

OK for the trunk?

Jeff
* tree-ssa-alias.c (stmt_kills_ref_p): Handle BUILT_IN_CALLOC.
	* tree-ssa-dse.c: Update various comments to distinguish between
	dead and redundant stores.
	(initialize_ao_ref_for_dse): Handle BUILT_IN_CALLOC.
	(dse_optimize_redundant_stores): New function.
	(delete_dead_or_redundant_call): Renamed from delete_dead_call.
	Distinguish between dead and redundant calls in dump output.  All
	callers updated.
	(delete_dead_or_redundant_assignment): Similarly for assignments.
	(dse_optimize_stmt): Handle _CHK variants.  For statements which
	store 0 into multiple memory locations, try to prove a subsequent
	store is redundant.
	
	* g++.dg/tree-ssa/pr90883.C: New test.
	* gcc.dg/tree-ssa/ssa-dse-36.c: New test.

Comments

Richard Biener June 26, 2019, 11:53 a.m. | #1
On Wed, Jun 26, 2019 at 6:17 AM Jeff Law <law@redhat.com> wrote:
>

> So based on the conversation in the BZ I cobbled together a patch to

> extend tree-ssa-dse.c to also detect redundant stores.

>

> To be clear, given two stores, the first store is dead if the later

> store overwrites all the live bytes set by the first store.   In this

> case we delete the first store.  If the first store is partially dead we

> may trim it.

>

> Given two stores, the second store is redundant if it stores _the same

> value_ into locations set by the first store.  In this case we delete

> the second store.

>

>

> We prefer to remove redundant stores over removing dead or trimming

> partially dead stores.

>

> First, if we detect a redundant store, we can always remove it.  We may

> not always be able to trim a partially dead store.  So removing the

> redundant store wins in this case.

>

> But even if the redundant store occurs at the head or tail of the prior

> store, removing the redundant store is better than trimming the

> partially dead store because we end up with fewer calls to memset with

> the same number of total bytes written.

>

> We only look for redundant stores in a few cases.  The first store must

> be a memset, empty constructor or calloc call -- ie things which

> initialize multiple memory locations to zero.  Subsequent stores can

> occur via memset, empty constructors or simple memory assignments.

>

> The chagne to tree-ssa-alias.c deserves a quick note.

>

> When we're trying to determine if we have a redundant store, we create

> an AO_REF for the *second* store, then ask the alias system if the first

> store would kill the AO_REF.

>

> So while normally a calloc wouldn't ever kill anything in normal

> execution order, we're not asking about things in execution order.  We

> really just want to know if the calloc is going to write into the

> entirety of the AO_REF of the subsequent store.  So we compute the size

> of the allocation and we know the destination from the LHS of the calloc

> call and everything "just works".


I see how stmt_kills_ref_p is convenient here and it's the only
ref-must-include-other-ref kind of query the oracle supports right now.
Note it is not optimized for your particular case querying the same
stmt for multiple refs.  It's not refs_must_alias_p that is missing
but something stronger ('kills' is also wrong since both refs might
be reads), ref_covered_by_ref_p or so.  That said, factoring
stmt_kills_ref_p might not be so straight-forward for calls since
we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,
ao_ref_init_loads_from_call?).

So I think the tree-ssa-alias.c change is fine but please put a
comment before

+             if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
+               {
+                 tree arg0 = gimple_call_arg (stmt, 0);

explaining this is used by DSE to detect redundant stores.

> This patch also includes a hunk I apparently left out from yesterday's

> submission which just adds _CHK cases to all the existing BUILT_IN_MEM*

> cases.  That's what I get for writing this part first, then adding the

> _CHK stuff afterwards, then reversing the order of submission.

>

> This includes a slightly reduced testcase from the BZ in g++.dg -- it's

> actually a good way to capture when one empty constructor causes another

> empty constructor to be redundant.  The gcc.dg cases capture other

> scenarios.

>

> This has been bootstrapped and regression tested on x86-64, i686, ppc64,

> ppc64le, sparc64 & aarch64.  It's also bootstrapped on various arm

> targets, alpha, m68k, mips, riscv64, sh4.  It's been built and tested on

> a variety of *-elf targets as well as various *-linux-gnu targets as

> crosses.  ANd just for giggles it was tested before the changes to add

> the _CHK support, so it works with and without that as well.

>

> OK for the trunk?


I also notice we wouldn't handle

   memset(p, 1, 64);
   memset(p, 1, 32);

(non-zero-initializer) or

  x = {};
  y = {};
  x.a = {};

(intermediate non-aliasing store)

Did you see if / how often this triggers on trunk?

OK.

Thanks,
Richard.

>

> Jeff
Jeff Law June 26, 2019, 6:12 p.m. | #2
On 6/26/19 5:53 AM, Richard Biener wrote:
> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law <law@redhat.com> wrote:

>>

>> So based on the conversation in the BZ I cobbled together a patch to

>> extend tree-ssa-dse.c to also detect redundant stores.

>>

>> To be clear, given two stores, the first store is dead if the later

>> store overwrites all the live bytes set by the first store.   In this

>> case we delete the first store.  If the first store is partially dead we

>> may trim it.

>>

>> Given two stores, the second store is redundant if it stores _the same

>> value_ into locations set by the first store.  In this case we delete

>> the second store.

>>

>>

>> We prefer to remove redundant stores over removing dead or trimming

>> partially dead stores.

>>

>> First, if we detect a redundant store, we can always remove it.  We may

>> not always be able to trim a partially dead store.  So removing the

>> redundant store wins in this case.

>>

>> But even if the redundant store occurs at the head or tail of the prior

>> store, removing the redundant store is better than trimming the

>> partially dead store because we end up with fewer calls to memset with

>> the same number of total bytes written.

>>

>> We only look for redundant stores in a few cases.  The first store must

>> be a memset, empty constructor or calloc call -- ie things which

>> initialize multiple memory locations to zero.  Subsequent stores can

>> occur via memset, empty constructors or simple memory assignments.

>>

>> The chagne to tree-ssa-alias.c deserves a quick note.

>>

>> When we're trying to determine if we have a redundant store, we create

>> an AO_REF for the *second* store, then ask the alias system if the first

>> store would kill the AO_REF.

>>

>> So while normally a calloc wouldn't ever kill anything in normal

>> execution order, we're not asking about things in execution order.  We

>> really just want to know if the calloc is going to write into the

>> entirety of the AO_REF of the subsequent store.  So we compute the size

>> of the allocation and we know the destination from the LHS of the calloc

>> call and everything "just works".

> 

> I see how stmt_kills_ref_p is convenient here and it's the only

> ref-must-include-other-ref kind of query the oracle supports right now.

> Note it is not optimized for your particular case querying the same

> stmt for multiple refs.  It's not refs_must_alias_p that is missing

> but something stronger ('kills' is also wrong since both refs might

> be reads), ref_covered_by_ref_p or so.  That said, factoring

> stmt_kills_ref_p might not be so straight-forward for calls since

> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,

> ao_ref_init_loads_from_call?).

> 

> So I think the tree-ssa-alias.c change is fine but please put a

> comment before

> 

> +             if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)

> +               {

> +                 tree arg0 = gimple_call_arg (stmt, 0);

> 

> explaining this is used by DSE to detect redundant stores.

> 

>> This patch also includes a hunk I apparently left out from yesterday's

>> submission which just adds _CHK cases to all the existing BUILT_IN_MEM*

>> cases.  That's what I get for writing this part first, then adding the

>> _CHK stuff afterwards, then reversing the order of submission.

>>

>> This includes a slightly reduced testcase from the BZ in g++.dg -- it's

>> actually a good way to capture when one empty constructor causes another

>> empty constructor to be redundant.  The gcc.dg cases capture other

>> scenarios.

>>

>> This has been bootstrapped and regression tested on x86-64, i686, ppc64,

>> ppc64le, sparc64 & aarch64.  It's also bootstrapped on various arm

>> targets, alpha, m68k, mips, riscv64, sh4.  It's been built and tested on

>> a variety of *-elf targets as well as various *-linux-gnu targets as

>> crosses.  ANd just for giggles it was tested before the changes to add

>> the _CHK support, so it works with and without that as well.

>>

>> OK for the trunk?

> 

> I also notice we wouldn't handle

> 

>    memset(p, 1, 64);

>    memset(p, 1, 32);

or even just *p = 1.

> 

> (non-zero-initializer) or

> 

>   x = {};

>   y = {};

>   x.a = {};

> 

> (intermediate non-aliasing store)

Correct in both cases.  Handling the first wouldn't be terribly hard to
implement, I can probably cobble that together quickly to see if it even
triggers.  THe memset-memset case was the least commonly detected IIRC.

The second would likely require more tracking and some of the other
complexities we handle for dead stores.  Possible?  Certainly, not sure
if it's worth the effort.

> 

> Did you see if / how often this triggers on trunk?

They triggered a couple hundred times.    I didn't initially implement
the calloc case, but found it reported in the llvm bug database.  It
triggered more often than I anticipated.

Jeff
Jeff Law June 26, 2019, 6:13 p.m. | #3
On 6/26/19 5:53 AM, Richard Biener wrote:
> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law <law@redhat.com> wrote:

>>

>> So based on the conversation in the BZ I cobbled together a patch to

>> extend tree-ssa-dse.c to also detect redundant stores.

>>

>> To be clear, given two stores, the first store is dead if the later

>> store overwrites all the live bytes set by the first store.   In this

>> case we delete the first store.  If the first store is partially dead we

>> may trim it.

>>

>> Given two stores, the second store is redundant if it stores _the same

>> value_ into locations set by the first store.  In this case we delete

>> the second store.

>>

>>

>> We prefer to remove redundant stores over removing dead or trimming

>> partially dead stores.

>>

>> First, if we detect a redundant store, we can always remove it.  We may

>> not always be able to trim a partially dead store.  So removing the

>> redundant store wins in this case.

>>

>> But even if the redundant store occurs at the head or tail of the prior

>> store, removing the redundant store is better than trimming the

>> partially dead store because we end up with fewer calls to memset with

>> the same number of total bytes written.

>>

>> We only look for redundant stores in a few cases.  The first store must

>> be a memset, empty constructor or calloc call -- ie things which

>> initialize multiple memory locations to zero.  Subsequent stores can

>> occur via memset, empty constructors or simple memory assignments.

>>

>> The chagne to tree-ssa-alias.c deserves a quick note.

>>

>> When we're trying to determine if we have a redundant store, we create

>> an AO_REF for the *second* store, then ask the alias system if the first

>> store would kill the AO_REF.

>>

>> So while normally a calloc wouldn't ever kill anything in normal

>> execution order, we're not asking about things in execution order.  We

>> really just want to know if the calloc is going to write into the

>> entirety of the AO_REF of the subsequent store.  So we compute the size

>> of the allocation and we know the destination from the LHS of the calloc

>> call and everything "just works".

> 

> I see how stmt_kills_ref_p is convenient here and it's the only

> ref-must-include-other-ref kind of query the oracle supports right now.

> Note it is not optimized for your particular case querying the same

> stmt for multiple refs.  It's not refs_must_alias_p that is missing

> but something stronger ('kills' is also wrong since both refs might

> be reads), ref_covered_by_ref_p or so.  That said, factoring

> stmt_kills_ref_p might not be so straight-forward for calls since

> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,

> ao_ref_init_loads_from_call?).

> 

> So I think the tree-ssa-alias.c change is fine but please put a

> comment before

> 

> +             if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)

> +               {

> +                 tree arg0 = gimple_call_arg (stmt, 0);

> 

> explaining this is used by DSE to detect redundant stores.

Agreed.  I should have done this given I called it out in the email.

Jeff
Bill Schmidt June 27, 2019, 1:14 a.m. | #4
Looks like this patch breaks bootstrap.

/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In function
'void dse\
_optimize_redundant_stores(gimple*)':
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:649:46: error:
ISO C++\
 forbids converting a string constant to 'char*' [-Werror=write-strings]
  649 |   delete_dead_or_redundant_assignment (&gsi, "redundant");
      |                                              ^~~~~~~~~~~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:651:40: error:
ISO C++\
 forbids converting a string constant to 'char*' [-Werror=write-strings]
  651 |   delete_dead_or_redundant_call (&gsi, "redundant");
      |                                        ^~~~~~~~~~~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In member
function 'v\
oid dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)':
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:979:41: error:
ISO C++\
 forbids converting a string constant to 'char*' [-Werror=write-strings]
  979 |     delete_dead_or_redundant_call (gsi, "dead");
      |                                         ^~~~~~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1007:39: error:
ISO C+\
+ forbids converting a string constant to 'char*' [-Werror=write-strings]
 1007 |   delete_dead_or_redundant_call (gsi, "dead");
      |                                       ^~~~~~
/home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1066:49: error:
ISO C+\
+ forbids converting a string constant to 'char*' [-Werror=write-strings]
 1066 |       delete_dead_or_redundant_assignment (gsi, "dead");
      |                                                 ^~~~~~

Thanks,
Bill

On 6/26/19 1:13 PM, Jeff Law wrote:
> On 6/26/19 5:53 AM, Richard Biener wrote:

>> On Wed, Jun 26, 2019 at 6:17 AM Jeff Law <law@redhat.com> wrote:

>>> So based on the conversation in the BZ I cobbled together a patch to

>>> extend tree-ssa-dse.c to also detect redundant stores.

>>>

>>> To be clear, given two stores, the first store is dead if the later

>>> store overwrites all the live bytes set by the first store.   In this

>>> case we delete the first store.  If the first store is partially dead we

>>> may trim it.

>>>

>>> Given two stores, the second store is redundant if it stores _the same

>>> value_ into locations set by the first store.  In this case we delete

>>> the second store.

>>>

>>>

>>> We prefer to remove redundant stores over removing dead or trimming

>>> partially dead stores.

>>>

>>> First, if we detect a redundant store, we can always remove it.  We may

>>> not always be able to trim a partially dead store.  So removing the

>>> redundant store wins in this case.

>>>

>>> But even if the redundant store occurs at the head or tail of the prior

>>> store, removing the redundant store is better than trimming the

>>> partially dead store because we end up with fewer calls to memset with

>>> the same number of total bytes written.

>>>

>>> We only look for redundant stores in a few cases.  The first store must

>>> be a memset, empty constructor or calloc call -- ie things which

>>> initialize multiple memory locations to zero.  Subsequent stores can

>>> occur via memset, empty constructors or simple memory assignments.

>>>

>>> The chagne to tree-ssa-alias.c deserves a quick note.

>>>

>>> When we're trying to determine if we have a redundant store, we create

>>> an AO_REF for the *second* store, then ask the alias system if the first

>>> store would kill the AO_REF.

>>>

>>> So while normally a calloc wouldn't ever kill anything in normal

>>> execution order, we're not asking about things in execution order.  We

>>> really just want to know if the calloc is going to write into the

>>> entirety of the AO_REF of the subsequent store.  So we compute the size

>>> of the allocation and we know the destination from the LHS of the calloc

>>> call and everything "just works".

>> I see how stmt_kills_ref_p is convenient here and it's the only

>> ref-must-include-other-ref kind of query the oracle supports right now.

>> Note it is not optimized for your particular case querying the same

>> stmt for multiple refs.  It's not refs_must_alias_p that is missing

>> but something stronger ('kills' is also wrong since both refs might

>> be reads), ref_covered_by_ref_p or so.  That said, factoring

>> stmt_kills_ref_p might not be so straight-forward for calls since

>> we lack a general ao_ref_init for calls (ao_ref_init_stores_from_call,

>> ao_ref_init_loads_from_call?).

>>

>> So I think the tree-ssa-alias.c change is fine but please put a

>> comment before

>>

>> +             if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)

>> +               {

>> +                 tree arg0 = gimple_call_arg (stmt, 0);

>>

>> explaining this is used by DSE to detect redundant stores.

> Agreed.  I should have done this given I called it out in the email.

>

> Jeff

>
Jeff Law June 27, 2019, 2:23 a.m. | #5
On 6/26/19 7:14 PM, Bill Schmidt wrote:
> Looks like this patch breaks bootstrap.

> 

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In function

> 'void dse\

> _optimize_redundant_stores(gimple*)':

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:649:46: error:

> ISO C++\

>  forbids converting a string constant to 'char*' [-Werror=write-strings]

>   649 |   delete_dead_or_redundant_assignment (&gsi, "redundant");

>       |                                              ^~~~~~~~~~~

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:651:40: error:

> ISO C++\

>  forbids converting a string constant to 'char*' [-Werror=write-strings]

>   651 |   delete_dead_or_redundant_call (&gsi, "redundant");

>       |                                        ^~~~~~~~~~~

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In member

> function 'v\

> oid dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)':

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:979:41: error:

> ISO C++\

>  forbids converting a string constant to 'char*' [-Werror=write-strings]

>   979 |     delete_dead_or_redundant_call (gsi, "dead");

>       |                                         ^~~~~~

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1007:39: error:

> ISO C+\

> + forbids converting a string constant to 'char*' [-Werror=write-strings]

>  1007 |   delete_dead_or_redundant_call (gsi, "dead");

>       |                                       ^~~~~~

> /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1066:49: error:

> ISO C+\

> + forbids converting a string constant to 'char*' [-Werror=write-strings]

>  1066 |       delete_dead_or_redundant_assignment (gsi, "dead");

Strange as I know I've bootstrapped and tested.  I'll take care of it

Thanks,
jeff
Christophe Lyon June 27, 2019, 10:19 a.m. | #6
On Thu, 27 Jun 2019 at 04:23, Jeff Law <law@redhat.com> wrote:
>

> On 6/26/19 7:14 PM, Bill Schmidt wrote:

> > Looks like this patch breaks bootstrap.

> >

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In function

> > 'void dse\

> > _optimize_redundant_stores(gimple*)':

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:649:46: error:

> > ISO C++\

> >  forbids converting a string constant to 'char*' [-Werror=write-strings]

> >   649 |   delete_dead_or_redundant_assignment (&gsi, "redundant");

> >       |                                              ^~~~~~~~~~~

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:651:40: error:

> > ISO C++\

> >  forbids converting a string constant to 'char*' [-Werror=write-strings]

> >   651 |   delete_dead_or_redundant_call (&gsi, "redundant");

> >       |                                        ^~~~~~~~~~~

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In member

> > function 'v\

> > oid dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)':

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:979:41: error:

> > ISO C++\

> >  forbids converting a string constant to 'char*' [-Werror=write-strings]

> >   979 |     delete_dead_or_redundant_call (gsi, "dead");

> >       |                                         ^~~~~~

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1007:39: error:

> > ISO C+\

> > + forbids converting a string constant to 'char*' [-Werror=write-strings]

> >  1007 |   delete_dead_or_redundant_call (gsi, "dead");

> >       |                                       ^~~~~~

> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1066:49: error:

> > ISO C+\

> > + forbids converting a string constant to 'char*' [-Werror=write-strings]

> >  1066 |       delete_dead_or_redundant_assignment (gsi, "dead");

> Strange as I know I've bootstrapped and tested.  I'll take care of it

>

> Thanks,

> jeff



Hi Jeff,

I've also noticed that
FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted
redundant store: .*.a = {}"
on aarch64

Christophe
Rainer Orth June 28, 2019, 7:55 a.m. | #7
Hi Christophe,

> I've also noticed that

> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted

> redundant store: .*.a = {}"

> on aarch64


it also FAILs on sparc, ia64, m68k, and mips64el.

	Rainer

-- 
-----------------------------------------------------------------------------
Rainer Orth, Center for Biotechnology, Bielefeld University
Jeff Law June 28, 2019, 3:41 p.m. | #8
On 6/28/19 1:55 AM, Rainer Orth wrote:
> Hi Christophe,

> 

>> I've also noticed that

>> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted

>> redundant store: .*.a = {}"

>> on aarch64

> 

> it also FAILs on sparc, ia64, m68k, and mips64el.

sparc seems to be emitting a loop when we expect an empty constructor
initializer.  I'm digging into the best way to address.

jeff
Jeff Law June 28, 2019, 4:17 p.m. | #9
On 6/28/19 1:55 AM, Rainer Orth wrote:
> Hi Christophe,

> 

>> I've also noticed that

>> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted

>> redundant store: .*.a = {}"

>> on aarch64

> 

> it also FAILs on sparc, ia64, m68k, and mips64el.

It's an interaction with CLEAR_RATIO that's causing the differences in
the generic code.  Using -Os seems to bias towards using an empty
constructor over a loop.

I suspect that's all we're going to need to fix this on these targets.

jeff
Jeff Law June 28, 2019, 4:32 p.m. | #10
On 6/28/19 1:55 AM, Rainer Orth wrote:
> Hi Christophe,

> 

>> I've also noticed that

>> FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted

>> redundant store: .*.a = {}"

>> on aarch64

> 

> it also FAILs on sparc, ia64, m68k, and mips64el.

alpha is also affected.

Jeff

Patch

diff --git a/gcc/tree-ssa-alias.c b/gcc/tree-ssa-alias.c
index d9307390e4c..2ec35499e21 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -2848,13 +2848,30 @@  stmt_kills_ref_p (gimple *stmt, ao_ref *ref)
 	  case BUILT_IN_MEMSET_CHK:
 	  case BUILT_IN_STRNCPY:
 	  case BUILT_IN_STPNCPY:
+	  case BUILT_IN_CALLOC:
 	    {
 	      /* For a must-alias check we need to be able to constrain
 		 the access properly.  */
 	      if (!ref->max_size_known_p ())
 		return false;
-	      tree dest = gimple_call_arg (stmt, 0);
-	      tree len = gimple_call_arg (stmt, 2);
+	      tree dest;
+	      tree len;
+	      if (DECL_FUNCTION_CODE (callee) == BUILT_IN_CALLOC)
+		{
+		  tree arg0 = gimple_call_arg (stmt, 0);
+		  tree arg1 = gimple_call_arg (stmt, 1);
+		  if (TREE_CODE (arg0) != INTEGER_CST
+		      || TREE_CODE (arg1) != INTEGER_CST)
+		    return false;
+
+		  dest = gimple_call_lhs (stmt);
+		  len = fold_build2 (MULT_EXPR, TREE_TYPE (arg0), arg0, arg1);
+		}
+	      else
+		{
+		  dest = gimple_call_arg (stmt, 0);
+		  len = gimple_call_arg (stmt, 2);
+		}
 	      if (!poly_int_tree_p (len))
 		return false;
 	      tree rbase = ref->base;
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 82e6dbfeb96..9b4d19232ee 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -1,4 +1,4 @@ 
-/* Dead store elimination
+/* Dead and redundant store elimination
    Copyright (C) 2004-2019 Free Software Foundation, Inc.
 
 This file is part of GCC.
@@ -41,12 +41,20 @@  along with GCC; see the file COPYING3.  If not see
 
    A dead store is a store into a memory location which will later be
    overwritten by another store without any intervening loads.  In this
-   case the earlier store can be deleted.
+   case the earlier store can be deleted or trimmed if the store
+   was partially dead.
+
+   A redundant store is a store into a memory location which stores
+   the exact same value as a prior store to the same memory location.
+   While this can often be handled by dead store elimination, removing
+   the redundant store is often better than removing or trimming the
+   dead store.
 
    In our SSA + virtual operand world we use immediate uses of virtual
-   operands to detect dead stores.  If a store's virtual definition
+   operands to detect these cases.  If a store's virtual definition
    is used precisely once by a later store to the same location which
-   post dominates the first store, then the first store is dead.
+   post dominates the first store, then the first store is dead.  If
+   the data stored is the same, then the second store is redundant.
 
    The single use of the store's virtual definition ensures that
    there are no intervening aliased loads and the requirement that
@@ -58,7 +66,9 @@  along with GCC; see the file COPYING3.  If not see
    the point immediately before the later store.  Again, the single
    use of the virtual definition and the post-dominance relationship
    ensure that such movement would be safe.  Clearly if there are
-   back to back stores, then the second is redundant.
+   back to back stores, then the second is makes the first dead.  If
+   the second store stores the same value, then the second store is
+   redundant.
 
    Reviewing section 10.7.2 in Morgan's "Building an Optimizing Compiler"
    may also help in understanding this code since it discusses the
@@ -66,6 +76,8 @@  along with GCC; see the file COPYING3.  If not see
    fact, they are the same transformation applied to different views of
    the CFG.  */
 
+static void delete_dead_or_redundant_assignment (gimple_stmt_iterator *, char []);
+static void delete_dead_or_redundant_call (gimple_stmt_iterator *, char []);
 
 /* Bitmap of blocks that have had EH statements cleaned.  We should
    remove their dead edges eventually.  */
@@ -109,6 +121,25 @@  initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
 	      ao_ref_init_from_ptr_and_size (write, ptr, size);
 	      return true;
 	    }
+
+	  /* A calloc call can never be dead, but it can make
+	     subsequent stores redundant if they store 0 into
+	     the same memory locations.  */
+	  case BUILT_IN_CALLOC:
+	    {
+	      tree nelem = gimple_call_arg (stmt, 0);
+	      tree selem = gimple_call_arg (stmt, 1);
+	      if (TREE_CODE (nelem) == INTEGER_CST
+		  && TREE_CODE (selem) == INTEGER_CST)
+		{
+		  tree lhs = gimple_call_lhs (stmt);
+		  tree size = fold_build2 (MULT_EXPR, TREE_TYPE (nelem),
+					   nelem, selem);
+		  ao_ref_init_from_ptr_and_size (write, lhs, size);
+		  return true;
+		}
+	    }
+
 	  default:
 	    break;
 	}
@@ -557,6 +588,74 @@  check_name (tree, tree *idx, void *data)
   return true;
 }
 
+/* STMT stores the value 0 into one or more memory locations
+   (via memset, empty constructor, calloc call, etc).
+
+   See if there is a subsequent store of the value 0 to one
+   or more of the same memory location(s).  If so, the subsequent
+   store is redundant and can be removed.
+
+   The subsequent stores could be via memset, empty constructors,
+   simple MEM stores, etc.  */
+
+static void
+dse_optimize_redundant_stores (gimple *stmt)
+{
+  int cnt = 0;
+
+  /* We could do something fairly complex and look through PHIs
+     like DSE_CLASSIFY_STORE, but it doesn't seem to be worth
+     the effort.
+
+     Look at all the immediate uses of the VDEF (which are obviously
+     dominated by STMT).   See if one or more stores 0 into the same
+     memory locations a STMT, if so remove the immediate use statements.  */
+  tree defvar = gimple_vdef (stmt);
+  imm_use_iterator ui;
+  gimple *use_stmt;
+  FOR_EACH_IMM_USE_STMT (use_stmt, ui, defvar)
+    {
+      /* Limit stmt walking.  */
+      if (++cnt > PARAM_VALUE (PARAM_DSE_MAX_ALIAS_QUERIES_PER_STORE))
+	BREAK_FROM_IMM_USE_STMT (ui);
+
+      /* If USE_STMT stores 0 into one or more of the same locations
+	 as STMT and STMT would kill USE_STMT, then we can just remove
+	 USE_STMT.  */
+      tree fndecl;
+      if ((is_gimple_assign (use_stmt)
+	   && gimple_vdef (use_stmt)
+	   && ((gimple_assign_rhs_code (use_stmt) == CONSTRUCTOR
+	        && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (use_stmt)) == 0
+	        && !gimple_clobber_p (stmt))
+	       || (gimple_assign_rhs_code (use_stmt) == INTEGER_CST
+		   && integer_zerop (gimple_assign_rhs1 (use_stmt)))))
+	  || (gimple_call_builtin_p (use_stmt, BUILT_IN_NORMAL)
+	      && (fndecl = gimple_call_fndecl (use_stmt)) != NULL
+	      && (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET
+		  || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET_CHK)
+	      && integer_zerop (gimple_call_arg (use_stmt, 1))))
+	{
+	  ao_ref write;
+
+	  if (!initialize_ao_ref_for_dse (use_stmt, &write))
+	    BREAK_FROM_IMM_USE_STMT (ui)
+
+	  if (valid_ao_ref_for_dse (&write)
+	      && stmt_kills_ref_p (stmt, &write))
+	    {
+	      gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
+	      if (is_gimple_assign (use_stmt))
+		delete_dead_or_redundant_assignment (&gsi, "redundant");
+	      else if (is_gimple_call (use_stmt))
+		delete_dead_or_redundant_call (&gsi, "redundant");
+	      else
+		gcc_unreachable ();
+	    }
+	}
+    }
+}
+
 /* A helper of dse_optimize_stmt.
    Given a GIMPLE_ASSIGN in STMT that writes to REF, classify it
    according to downstream uses and defs.  Sets *BY_CLOBBER_P to true
@@ -769,12 +868,12 @@  private:
 
 /* Delete a dead call at GSI, which is mem* call of some kind.  */
 static void
-delete_dead_call (gimple_stmt_iterator *gsi)
+delete_dead_or_redundant_call (gimple_stmt_iterator *gsi, char *type)
 {
   gimple *stmt = gsi_stmt (*gsi);
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "  Deleted dead call: ");
+      fprintf (dump_file, "  Deleted %s call: ", type);
       print_gimple_stmt (dump_file, stmt, 0, dump_flags);
       fprintf (dump_file, "\n");
     }
@@ -803,12 +902,12 @@  delete_dead_call (gimple_stmt_iterator *gsi)
 /* Delete a dead store at GSI, which is a gimple assignment. */
 
 static void
-delete_dead_assignment (gimple_stmt_iterator *gsi)
+delete_dead_or_redundant_assignment (gimple_stmt_iterator *gsi, char *type)
 {
   gimple *stmt = gsi_stmt (*gsi);
   if (dump_file && (dump_flags & TDF_DETAILS))
     {
-      fprintf (dump_file, "  Deleted dead store: ");
+      fprintf (dump_file, "  Deleted %s store: ", type);
       print_gimple_stmt (dump_file, stmt, 0, dump_flags);
       fprintf (dump_file, "\n");
     }
@@ -861,11 +960,15 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
      some builtin calls.  */
   if (gimple_call_builtin_p (stmt, BUILT_IN_NORMAL))
     {
-      switch (DECL_FUNCTION_CODE (gimple_call_fndecl (stmt)))
+      tree fndecl = gimple_call_fndecl (stmt);
+      switch (DECL_FUNCTION_CODE (fndecl))
 	{
 	  case BUILT_IN_MEMCPY:
 	  case BUILT_IN_MEMMOVE:
 	  case BUILT_IN_MEMSET:
+	  case BUILT_IN_MEMCPY_CHK:
+	  case BUILT_IN_MEMMOVE_CHK:
+	  case BUILT_IN_MEMSET_CHK:
 	    {
 	      /* Occasionally calls with an explicit length of zero
 		 show up in the IL.  It's pointless to do analysis
@@ -873,10 +976,18 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	      tree size = gimple_call_arg (stmt, 2);
 	      if (integer_zerop (size))
 		{
-		  delete_dead_call (gsi);
+		  delete_dead_or_redundant_call (gsi, "dead");
 		  return;
 		}
 
+	      /* If this is a memset call that initializes an object
+		 to zero, it may be redundant with an earlier memset
+		 or empty CONSTRUCTOR of a larger object.  */
+	      if ((DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET
+		   || DECL_FUNCTION_CODE (fndecl) == BUILT_IN_MEMSET_CHK)
+		  && integer_zerop (gimple_call_arg (stmt, 1)))
+		dse_optimize_redundant_stores (stmt);
+
 	      enum dse_store_status store_status;
 	      m_byte_tracking_enabled
 		= setup_live_bytes_from_ref (&ref, m_live_bytes);
@@ -893,10 +1004,14 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 		}
 
 	      if (store_status == DSE_STORE_DEAD)
-		delete_dead_call (gsi);
+		delete_dead_or_redundant_call (gsi, "dead");
 	      return;
 	    }
 
+	  case BUILT_IN_CALLOC:
+	    /* We already know the arguments are integer constants.  */
+	    dse_optimize_redundant_stores (stmt);
+
 	  default:
 	    return;
 	}
@@ -906,6 +1021,18 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
     {
       bool by_clobber_p = false;
 
+      /* First see if this store is a CONSTRUCTOR and if there
+	 are subsequent CONSTRUCTOR stores which are totally
+	 subsumed by this statement.  If so remove the subsequent
+	 CONSTRUCTOR store.
+
+	 This will tend to make fewer calls into memset with longer
+	 arguments.  */
+      if (gimple_assign_rhs_code (stmt) == CONSTRUCTOR
+	  && CONSTRUCTOR_NELTS (gimple_assign_rhs1 (stmt)) == 0
+	  && !gimple_clobber_p (stmt))
+	dse_optimize_redundant_stores (stmt);
+
       /* Self-assignments are zombies.  */
       if (operand_equal_p (gimple_assign_rhs1 (stmt),
 			   gimple_assign_lhs (stmt), 0))
@@ -936,7 +1063,7 @@  dse_dom_walker::dse_optimize_stmt (gimple_stmt_iterator *gsi)
 	  && !by_clobber_p)
 	return;
 
-      delete_dead_assignment (gsi);
+      delete_dead_or_redundant_assignment (gsi, "dead");
     }
 }
 

diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr90883.C b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
new file mode 100644
index 00000000000..005b2103b4b
--- /dev/null
+++ b/gcc/testsuite/g++.dg/tree-ssa/pr90883.C
@@ -0,0 +1,19 @@ 
+// { dg-options "-O2 -fdump-tree-dse1-details -std=c++11" }
+
+
+    class C
+    {
+        char a[7]{};
+        int b{};
+    };
+
+    C slow()
+    {
+        return {};
+    }
+
+
+// We want to match enough here to capture that we deleted an empty
+// constructor store
+// { dg-final { scan-tree-dump "Deleted redundant store: .*\.a = {}" "dse1" } }
+
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-36.c b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-36.c
new file mode 100644
index 00000000000..23a53bb4ad2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-36.c
@@ -0,0 +1,65 @@ 
+/* { dg-options "-O2 -fdump-tree-dse-details -fno-tree-fre" } */
+#include <string.h>
+#include <stdlib.h>
+
+struct X
+{
+  char mem0[10];
+  char mem1[10];
+};
+
+
+void blah (struct X);
+
+
+void
+foo1()
+{
+  struct X x = { };
+  memset (x.mem1, 0, sizeof x.mem1);
+  blah (x);
+}
+
+void
+foo2()
+{
+  struct X x = { };
+  x.mem1[5] = 0;
+  blah (x);
+}
+
+void
+bar1 ()
+{
+  struct X x;
+  memset (&x, 0, sizeof x);
+  memset (&x.mem1, 0, sizeof x.mem1);
+  blah (x);
+}
+void
+bar2 ()
+{
+  struct X x;
+  memset (&x, 0, sizeof x);
+  x.mem1[5] = 0;
+  blah (x);
+}
+
+void
+baz1 ()
+{
+  struct X *x = calloc (sizeof (struct X), 1);
+  memset (&x->mem1, 0, sizeof x->mem1);
+  blah (*x);
+}
+
+void
+baz2 ()
+{
+  struct X *x = calloc (sizeof (struct X), 1);
+  x->mem1[5] = 0;
+  blah (*x);
+}
+/* { dg-final { scan-tree-dump-times "Deleted redundant call" 3 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted redundant store" 3 "dse1" } } */
+