[4/4] SRA: Do not ignore padding when totally scalarizing [PR92486]

Message ID ri6wo9ca0p7.fsf@suse.cz
State New
Headers show
Series
  • [1/4] SRA: Add verification of accesses
Related show

Commit Message

Martin Jambor Jan. 28, 2020, 12:23 a.m.
Hi,

PR 92486 shows that DSE, when seeing a "normal" gimple aggregate
assignment coming from a C struct assignment and one a representing a
folded memcpy, can kill the latter and keep in place only the former,
which does not copy padding - at least when SRA decides to totally
scalarize a least one of the aggregates (when not doing total
scalarization, SRA cares about padding)

SRA would not totally scalarize an aggregate if it saw that it takes
part in a gimple assignment which is a folded memcpy (see how
type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't
because of the DSE decisions.

I was asked to modify SRA to take padding into account - and to copy
it around - when totally scalarizing, which is what the patch below
does.  I am not very happy about this, I am afraid it will lead to
performance regressions, but this has all been discussed (see
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01185.html and
https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00218.html).

I tried to alleviate the problem by not only inserting accesses for
padding but also by enlarging existing accesses whenever possible to
extend over padding - the extended access would get copied when in the
original IL an aggregate copy is replaced with SRA copies and a
BIT_FIELD_REF would be generated to replace a scalar access to a part
of the aggregate in the original IL.  I have made it work in the sense
that the patch passed bootstrap and testing (see the git branch
refs/users/jamborm/heads/sra-later_total-bfr-20200127 or look at
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/users/jamborm/heads/sra-later_total-bfr-20200127
if you are interested), but this approach meant that each such
extended replacement which was written to (so all of them) could
potentially become only partially assigned to and so had to be marked
as addressable and could not become gimple register, meaning that
total scalarizatio would be creating addressable variables.  Detecting
such cases is not easy, it would mean introducing yet another type of
write flag (written to exactly this access) and propagate that flag
across assignment sub-accesses.

So I decided that was not the way to go and instead only extended
integer accesses, and that is what the atcg below does.  Like in the
previous attempt, whatever padding could not be covered by extending
an access would be covered by extra artificial accesses.  As you can
see, it adds a little complexity to various places of the pass which
are already not trivial, but hopefully it is manageable.

Bootstrapped and tested on x86_64-linux, I'll curious about the
feedback.

Thanks,

Martin


2020-01-27  Martin Jambor  <mjambor@suse.cz>

	PR tree-optimization/92486
	* tree-sra.c: Include langhooks.h
	(struct access): New fields reg_size and reg_acc_type.
	(dump_access): Print new fields.
	(acc_size): New function.
	(find_access_in_subtree): Use it, new parameter reg.
	(get_var_base_offset_size_access): Pass true to
	find_access_in_subtree.
	(create_access_1): Initialize reg_size.
	(create_artificial_child_access): Likewise.
	(create_total_scalarization_access): Likewise.
	(build_ref_for_model): Do not use model expr if reg_acc_size is
	non-NULL.
	(get_reg_access_replacement): New function.
	(verify_sra_access_forest): Adjust verification for presence of
	extended accesses covering padding.
	(analyze_access_subtree): Undo extension over padding if total
	scalarization failed, set grp_partial_lhs if we are going to introduce
	a partial store to the new replacement, do not ignore holes when
	totally scalarizing.
	(sra_type_for_size): New function.
	(total_scalarization_fill_padding): Likewise.
	(total_should_skip_creating_access): Use it.
	(totally_scalarize_subtree): Likewise.
	(sra_modify_expr): Use get_reg_access_replacement instead of
	get_access_replacement, adjust for reg_acc_type.
	(sra_modify_assign): Likewise.
	(load_assign_lhs_subreplacements): Pass false to
	find_access_in_subtree.

	testsuite/
	* gcc.dg/tree-ssa/pr92486.c: New test.
---
 gcc/ChangeLog                           |  32 +++
 gcc/testsuite/ChangeLog                 |   5 +
 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c |  38 +++
 gcc/tree-sra.c                          | 368 +++++++++++++++++++++---
 4 files changed, 396 insertions(+), 47 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c

-- 
2.24.1

Comments

Richard Biener Jan. 29, 2020, 12:58 p.m. | #1
On Tue, 28 Jan 2020, Martin Jambor wrote:

> Hi,

> 

> PR 92486 shows that DSE, when seeing a "normal" gimple aggregate

> assignment coming from a C struct assignment and one a representing a

> folded memcpy, can kill the latter and keep in place only the former,

> which does not copy padding - at least when SRA decides to totally

> scalarize a least one of the aggregates (when not doing total

> scalarization, SRA cares about padding)

> 

> SRA would not totally scalarize an aggregate if it saw that it takes

> part in a gimple assignment which is a folded memcpy (see how

> type_changing_p is set in contains_vce_or_bfcref_p) but it doesn't

> because of the DSE decisions.

> 

> I was asked to modify SRA to take padding into account - and to copy

> it around - when totally scalarizing, which is what the patch below

> does.  I am not very happy about this, I am afraid it will lead to

> performance regressions, but this has all been discussed (see

> https://gcc.gnu.org/ml/gcc-patches/2019-12/msg01185.html and

> https://gcc.gnu.org/ml/gcc-patches/2020-01/msg00218.html).

> 

> I tried to alleviate the problem by not only inserting accesses for

> padding but also by enlarging existing accesses whenever possible to

> extend over padding - the extended access would get copied when in the

> original IL an aggregate copy is replaced with SRA copies and a

> BIT_FIELD_REF would be generated to replace a scalar access to a part

> of the aggregate in the original IL.  I have made it work in the sense

> that the patch passed bootstrap and testing (see the git branch

> refs/users/jamborm/heads/sra-later_total-bfr-20200127 or look at

> https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;a=shortlog;h=refs/users/jamborm/heads/sra-later_total-bfr-20200127

> if you are interested), but this approach meant that each such

> extended replacement which was written to (so all of them) could

> potentially become only partially assigned to and so had to be marked

> as addressable and could not become gimple register, meaning that

> total scalarizatio would be creating addressable variables.  Detecting

> such cases is not easy, it would mean introducing yet another type of

> write flag (written to exactly this access) and propagate that flag

> across assignment sub-accesses.

> 

> So I decided that was not the way to go and instead only extended

> integer accesses, and that is what the atcg below does.  Like in the

> previous attempt, whatever padding could not be covered by extending

> an access would be covered by extra artificial accesses.  As you can

> see, it adds a little complexity to various places of the pass which

> are already not trivial, but hopefully it is manageable.

> 

> Bootstrapped and tested on x86_64-linux, I'll curious about the

> feedback.


 
+static void
+upgrade_integral_size_to_prec (struct access *access)
+{

missing function comment.

I'm almost sure that get_reg_access_replacement gets things wrong
on non-litte-endian platforms.  The function comment says something
about bitfield refs but you simply create truncations.

sra_type_for_size shouldn't use lang_hooks.types.type_for_size
but build_nonstandard_integer_type.  In total_scalarization_fill_padding
avoid the loop figuring out the larger type and just up it to
MIN (lastpos alignment, word-mode-size)?

There's PR71460 and a more recent one on total scalarization
creating FP accesses which is probably a bug we'd want to address
first (and which might simplify this patch?).

Overall the approach in the patch looks reasonable and code-generation
even improves in some cases I tried.

As said, I wonder how this integrates with a fix to avoid using
FP types for total scalarization (or any access decomposition
that was a non-FPmode before).

Thanks,
Richard.

> Thanks,

> 

> Martin

> 

> 

> 2020-01-27  Martin Jambor  <mjambor@suse.cz>

> 

> 	PR tree-optimization/92486

> 	* tree-sra.c: Include langhooks.h

> 	(struct access): New fields reg_size and reg_acc_type.

> 	(dump_access): Print new fields.

> 	(acc_size): New function.

> 	(find_access_in_subtree): Use it, new parameter reg.

> 	(get_var_base_offset_size_access): Pass true to

> 	find_access_in_subtree.

> 	(create_access_1): Initialize reg_size.

> 	(create_artificial_child_access): Likewise.

> 	(create_total_scalarization_access): Likewise.

> 	(build_ref_for_model): Do not use model expr if reg_acc_size is

> 	non-NULL.

> 	(get_reg_access_replacement): New function.

> 	(verify_sra_access_forest): Adjust verification for presence of

> 	extended accesses covering padding.

> 	(analyze_access_subtree): Undo extension over padding if total

> 	scalarization failed, set grp_partial_lhs if we are going to introduce

> 	a partial store to the new replacement, do not ignore holes when

> 	totally scalarizing.

> 	(sra_type_for_size): New function.

> 	(total_scalarization_fill_padding): Likewise.

> 	(total_should_skip_creating_access): Use it.

> 	(totally_scalarize_subtree): Likewise.

> 	(sra_modify_expr): Use get_reg_access_replacement instead of

> 	get_access_replacement, adjust for reg_acc_type.

> 	(sra_modify_assign): Likewise.

> 	(load_assign_lhs_subreplacements): Pass false to

> 	find_access_in_subtree.

> 

> 	testsuite/

> 	* gcc.dg/tree-ssa/pr92486.c: New test.

> ---

>  gcc/ChangeLog                           |  32 +++

>  gcc/testsuite/ChangeLog                 |   5 +

>  gcc/testsuite/gcc.dg/tree-ssa/pr92486.c |  38 +++

>  gcc/tree-sra.c                          | 368 +++++++++++++++++++++---

>  4 files changed, 396 insertions(+), 47 deletions(-)

>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr92486.c

> 

> diff --git a/gcc/ChangeLog b/gcc/ChangeLog

> index 3541c6638f9..34c60e6f2a3 100644

> --- a/gcc/ChangeLog

> +++ b/gcc/ChangeLog

> @@ -1,3 +1,35 @@

> +2020-01-26  Martin Jambor  <mjambor@suse.cz>

> +

> +	PR tree-optimization/92486

> +	* tree-sra.c: Include langhooks.h

> +	(struct access): New fields reg_size and reg_acc_type.

> +	(dump_access): Print new fields.

> +	(acc_size): New function.

> +	(find_access_in_subtree): Use it, new parameter reg.

> +	(get_var_base_offset_size_access): Pass true to

> +	find_access_in_subtree.

> +	(create_access_1): Initialize reg_size.

> +	(create_artificial_child_access): Likewise.

> +	(create_total_scalarization_access): Likewise.

> +	(build_ref_for_model): Do not use model expr if reg_acc_size is

> +	non-NULL.

> +	(get_reg_access_replacement): New function.

> +	(verify_sra_access_forest): Adjust verification for presence of

> +	extended accesses covering padding.

> +	(analyze_access_subtree): Undo extension over padding if total

> +	scalarization failed, set grp_partial_lhs if we are going to introduce

> +	a partial store to the new replacement, do not ignore holes when

> +	totally scalarizing.

> +	(sra_type_for_size): New function.

> +	(total_scalarization_fill_padding): Likewise.

> +	(total_should_skip_creating_access): Use it.

> +	(totally_scalarize_subtree): Likewise.

> +	(sra_modify_expr): Use get_reg_access_replacement instead of

> +	get_access_replacement, adjust for reg_acc_type.

> +	(sra_modify_assign): Likewise.

> +	(load_assign_lhs_subreplacements): Pass false to

> +	find_access_in_subtree.

> +

>  2020-01-27  Martin Liska  <mliska@suse.cz>

>  

>  	PR driver/91220

> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog

> index 22a37dd1ab2..7ccb5098224 100644

> --- a/gcc/testsuite/ChangeLog

> +++ b/gcc/testsuite/ChangeLog

> @@ -1,3 +1,8 @@

> +2020-01-24  Martin Jambor  <mjambor@suse.cz>

> +

> +	PR tree-optimization/92486

> +	* gcc.dg/tree-ssa/pr92486.c: New test.

> +

>  2020-01-27  Martin Liska  <mliska@suse.cz>

>  

>  	PR target/93274

> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c

> new file mode 100644

> index 00000000000..77e84241eff

> --- /dev/null

> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c

> @@ -0,0 +1,38 @@

> +/* { dg-do run } */

> +/* { dg-options "-O1" } */

> +

> +struct s {

> +    char c;

> +    int i;

> +};

> +

> +__attribute__((noipa))

> +void f(struct s *p, struct s *q)

> +{

> +    struct s w;

> +

> +    __builtin_memset(&w, 0, sizeof(struct s));

> +    w = *q;

> +

> +    __builtin_memset(p, 0, sizeof(struct s));

> +    *p = w;

> +}

> +

> +int main()

> +{

> +    struct s x;

> +    __builtin_memset(&x, 1, sizeof(struct s));

> +

> +    struct s y;

> +    __builtin_memset(&y, 2, sizeof(struct s));

> +

> +    f(&y, &x);

> +

> +    for (unsigned char *p = (unsigned char *)&y;

> +	 p < (unsigned char *)&y + sizeof(struct s);

> +	 p++)

> +      if (*p != 1)

> +	__builtin_abort ();

> +

> +    return 0;

> +}

> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c

> index ea8594db193..bda342ffdf9 100644

> --- a/gcc/tree-sra.c

> +++ b/gcc/tree-sra.c

> @@ -99,6 +99,7 @@ along with GCC; see the file COPYING3.  If not see

>  #include "dbgcnt.h"

>  #include "builtins.h"

>  #include "tree-sra.h"

> +#include "langhooks.h"

>  

>  

>  /* Enumeration of all aggregate reductions we can do.  */

> @@ -130,11 +131,16 @@ struct assign_link;

>  

>  struct access

>  {

> -  /* Values returned by  `get_ref_base_and_extent' for each component reference

> -     If EXPR isn't a component reference  just set `BASE = EXPR', `OFFSET = 0',

> -     `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */

> +  /* Offset, size and base are values returned by `get_ref_base_and_extent' for

> +     each component reference If EXPR isn't a component reference just set

> +     `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */

>    HOST_WIDE_INT offset;

>    HOST_WIDE_INT size;

> +

> +  /* If reg_acc_type is non-NULL, this is the size of the actual egister type

> +     the access represented before it was extended to cover padding.  Otherwise

> +     must be equal to size.  */

> +  HOST_WIDE_INT reg_size;

>    tree base;

>  

>    /* Expression.  It is context dependent so do not use it to create new

> @@ -144,6 +150,12 @@ struct access

>    /* Type.  */

>    tree type;

>  

> +  /* If non-NULL, this is the type that should be actually extracted or

> +     inserted from/to the replacement decl when replacing accesses to the

> +     individual field itself (as opposed to accesses created as part of

> +     replacing aggregate copies which should use TYPE).  */

> +  tree reg_acc_type;

> +

>    /* The statement this access belongs to.  */

>    gimple *stmt;

>  

> @@ -391,10 +403,17 @@ dump_access (FILE *f, struct access *access, bool grp)

>    print_generic_expr (f, access->base);

>    fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset);

>    fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size);

> +  if (access->reg_size != access->size)

> +    fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size);

>    fprintf (f, ", expr = ");

>    print_generic_expr (f, access->expr);

>    fprintf (f, ", type = ");

>    print_generic_expr (f, access->type);

> +  if (access->reg_acc_type)

> +    {

> +      fprintf (f, ", reg_acc_type = ");

> +      print_generic_expr (f, access->type);

> +    }

>    fprintf (f, ", reverse = %d", access->reverse);

>    if (grp)

>      fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, "

> @@ -481,18 +500,27 @@ get_base_access_vector (tree base)

>    return base_access_vec->get (base);

>  }

>  

> +/* Return ACCESS's size or reg_size depending on REG.  */

> +

> +static HOST_WIDE_INT

> +acc_size (struct access *access, bool reg)

> +{

> +  return reg ? access->reg_size : access->size;

> +}

> +

>  /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted

>     in ACCESS.  Return NULL if it cannot be found.  */

>  

>  static struct access *

>  find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,

> -			HOST_WIDE_INT size)

> +			HOST_WIDE_INT size, bool reg)

>  {

> -  while (access && (access->offset != offset || access->size != size))

> +  while (access && (access->offset != offset

> +		    || acc_size (access, reg) != size))

>      {

>        struct access *child = access->first_child;

>  

> -      while (child && (child->offset + child->size <= offset))

> +      while (child && (child->offset + acc_size (child, reg) <= offset))

>  	child = child->next_sibling;

>        access = child;

>      }

> @@ -503,7 +531,7 @@ find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,

>    if (access)

>      while (access->first_child

>  	   && access->first_child->offset == offset

> -	   && access->first_child->size == size)

> +	   && acc_size (access->first_child, reg) == size)

>        access = access->first_child;

>  

>    return access;

> @@ -539,7 +567,7 @@ get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset,

>    if (!access)

>      return NULL;

>  

> -  return find_access_in_subtree (access, offset, size);

> +  return find_access_in_subtree (access, offset, size, true);

>  }

>  

>  /* Add LINK to the linked list of assign links of RACC.  */

> @@ -868,6 +896,7 @@ create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)

>    access->base = base;

>    access->offset = offset;

>    access->size = size;

> +  access->reg_size = size;

>  

>    base_access_vec->get_or_insert (base).safe_push (access);

>  

> @@ -1667,6 +1696,7 @@ build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,

>      {

>        tree res;

>        if (model->grp_same_access_path

> +	  && !model->reg_acc_type

>  	  && !TREE_THIS_VOLATILE (base)

>  	  && (TYPE_ADDR_SPACE (TREE_TYPE (base))

>  	      == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))

> @@ -2256,6 +2286,30 @@ get_access_replacement (struct access *access)

>    return access->replacement_decl;

>  }

>  

> +/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which

> +   case also construct BIT_FIELD_REF into the replacement of the corresponding

> +   type (with location LOC).  */

> +

> +static tree

> +get_reg_access_replacement (location_t loc, struct access *access, bool write,

> +			    gassign **conversion)

> +{

> +  tree repl = get_access_replacement (access);

> +  if (!access->reg_acc_type)

> +    {

> +      *conversion = NULL;

> +      return repl;

> +    }

> +

> +  tree tmp = make_ssa_name (access->reg_acc_type);

> +  if (write)

> +    *conversion = gimple_build_assign (repl, NOP_EXPR, tmp);

> +  else

> +    *conversion = gimple_build_assign (tmp, NOP_EXPR, repl);

> +  gimple_set_location (*conversion, loc);

> +  return tmp;

> +}

> +

>  

>  /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the

>     linked list along the way.  Stop when *ACCESS is NULL or the access pointed

> @@ -2339,7 +2393,12 @@ verify_sra_access_forest (struct access *root)

>        gcc_assert (offset == access->offset);

>        gcc_assert (access->grp_unscalarizable_region

>  		  || size == max_size);

> -      gcc_assert (max_size == access->size);

> +      if (access->reg_acc_type)

> +	gcc_assert (max_size == access->reg_size

> +		    && max_size < access->size);

> +      else

> +	gcc_assert (max_size == access->size

> +		    && max_size == access->reg_size);

>        gcc_assert (reverse == access->reverse);

>  

>        if (access->first_child)

> @@ -2405,6 +2464,39 @@ expr_with_var_bounded_array_refs_p (tree expr)

>    return false;

>  }

>  

> +static void

> +upgrade_integral_size_to_prec (struct access *access)

> +{

> +  /* Always create access replacements that cover the whole access.

> +     For integral types this means the precision has to match.

> +     Avoid assumptions based on the integral type kind, too.  */

> +  if (INTEGRAL_TYPE_P (access->type)

> +      && (TREE_CODE (access->type) != INTEGER_TYPE

> +	  || TYPE_PRECISION (access->type) != access->size)

> +      /* But leave bitfield accesses alone.  */

> +      && (TREE_CODE (access->expr) != COMPONENT_REF

> +	  || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1))))

> +    {

> +      tree rt = access->type;

> +      gcc_assert ((access->offset % BITS_PER_UNIT) == 0

> +		  && (access->size % BITS_PER_UNIT) == 0);

> +      access->type = build_nonstandard_integer_type (access->size,

> +						   TYPE_UNSIGNED (rt));

> +      access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base,

> +					 access->offset, access->reverse,

> +					 access->type, NULL, false);

> +

> +      if (dump_file && (dump_flags & TDF_DETAILS))

> +	{

> +	  fprintf (dump_file, "Changing the type of a replacement for ");

> +	  print_generic_expr (dump_file, access->base);

> +	  fprintf (dump_file, " offset: %u, size: %u ",

> +		   (unsigned) access->offset, (unsigned) access->size);

> +	  fprintf (dump_file, " to an integer.\n");

> +	}

> +    }

> +}

> +

>  /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when

>     both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  If TOTALLY

>     is set, we are totally scalarizing the aggregate.  Also set all sorts of

> @@ -2488,6 +2580,15 @@ analyze_access_subtree (struct access *root, struct access *parent,

>  	hole = true;

>      }

>  

> +  if (!totally && root->reg_acc_type)

> +    {

> +      /* If total scalarization did not eventually succeed, let's undo any

> +	 futile attempts to cover padding.  */

> +      root->type = root->reg_acc_type;

> +      root->size = root->reg_size;

> +      root->reg_acc_type = NULL_TREE;

> +    }

> +

>    if (allow_replacements && scalar && !root->first_child

>        && (totally || !root->grp_total_scalarization)

>        && (totally

> @@ -2495,34 +2596,7 @@ analyze_access_subtree (struct access *root, struct access *parent,

>  	  || ((root->grp_scalar_read || root->grp_assignment_read)

>  	      && (root->grp_scalar_write || root->grp_assignment_write))))

>      {

> -      /* Always create access replacements that cover the whole access.

> -         For integral types this means the precision has to match.

> -	 Avoid assumptions based on the integral type kind, too.  */

> -      if (INTEGRAL_TYPE_P (root->type)

> -	  && (TREE_CODE (root->type) != INTEGER_TYPE

> -	      || TYPE_PRECISION (root->type) != root->size)

> -	  /* But leave bitfield accesses alone.  */

> -	  && (TREE_CODE (root->expr) != COMPONENT_REF

> -	      || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1))))

> -	{

> -	  tree rt = root->type;

> -	  gcc_assert ((root->offset % BITS_PER_UNIT) == 0

> -		      && (root->size % BITS_PER_UNIT) == 0);

> -	  root->type = build_nonstandard_integer_type (root->size,

> -						       TYPE_UNSIGNED (rt));

> -	  root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base,

> -					     root->offset, root->reverse,

> -					     root->type, NULL, false);

> -

> -	  if (dump_file && (dump_flags & TDF_DETAILS))

> -	    {

> -	      fprintf (dump_file, "Changing the type of a replacement for ");

> -	      print_generic_expr (dump_file, root->base);

> -	      fprintf (dump_file, " offset: %u, size: %u ",

> -		       (unsigned) root->offset, (unsigned) root->size);

> -	      fprintf (dump_file, " to an integer.\n");

> -	    }

> -	}

> +      upgrade_integral_size_to_prec (root);

>  

>        root->grp_to_be_replaced = 1;

>        root->replacement_decl = create_access_replacement (root);

> @@ -2554,7 +2628,7 @@ analyze_access_subtree (struct access *root, struct access *parent,

>  	root->grp_total_scalarization = 0;

>      }

>  

> -  if (!hole || totally)

> +  if (!hole)

>      root->grp_covered = 1;

>    else if (root->grp_write || comes_initialized_p (root->base))

>      root->grp_unscalarized_data = 1; /* not covered and written to */

> @@ -2636,6 +2710,8 @@ create_artificial_child_access (struct access *parent, struct access *model,

>    access->expr = expr;

>    access->offset = new_offset;

>    access->size = model->size;

> +  gcc_assert (model->size == model->reg_size);

> +  access->reg_size = model->reg_size;

>    access->type = model->type;

>    access->parent = parent;

>    access->grp_read = set_grp_read;

> @@ -2966,6 +3042,7 @@ create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos,

>    access->base = parent->base;

>    access->offset = pos;

>    access->size = size;

> +  access->reg_size = size;

>    access->expr = expr;

>    access->type = type;

>    access->parent = parent;

> @@ -3015,6 +3092,138 @@ create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos,

>    return new_acc;

>  }

>  

> +/* Given SIZE return the integer type to represent that many bits or NULL if

> +   there is no suitable one.  */

> +

> +static tree

> +sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1)

> +{

> +  tree type = lang_hooks.types.type_for_size (size, unsignedp);

> +  scalar_int_mode mode;

> +  if (type

> +      && is_a <scalar_int_mode> (TYPE_MODE (type), &mode)

> +      && GET_MODE_BITSIZE (mode) == size)

> +    return type;

> +  else

> +    return NULL_TREE;

> +}

> +

> +/* Create a series of accesses to cover padding from FROM to TO anch chain them

> +   between LAST_PTR and NEXT_CHILD.  Return the pointer to the last created

> +   one.  */

> +

> +static struct access *

> +fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from,

> +			    HOST_WIDE_INT to, struct access **last_ptr,

> +			    struct access *next_child)

> +{

> +  struct access *access = NULL;

> +

> +  gcc_assert (from < to);

> +  do

> +    {

> +      HOST_WIDE_INT diff = to - from;

> +      gcc_assert (diff >= BITS_PER_UNIT);

> +      HOST_WIDE_INT stsz = 1 << floor_log2 (diff);

> +      tree type;

> +

> +      while (true)

> +	{

> +	  type = sra_type_for_size (stsz);

> +	  if (type)

> +	    break;

> +	  stsz /= 2;

> +	  gcc_checking_assert (stsz >= BITS_PER_UNIT);

> +	}

> +

> +      do {

> +	tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base,

> +					  from, parent->reverse, type, NULL,

> +					  false);

> +	access = create_total_scalarization_access (parent, from, stsz, type,

> +						    expr, last_ptr, next_child);

> +	access->grp_no_warning = 1;

> +	from += stsz;

> +      }

> +      while (to - from >= stsz);

> +      gcc_assert (from <= to);

> +    }

> +  while (from < to);

> +  return access;

> +}

> +

> +/* Check whether any padding between FROM and TO must be filled during

> +   total scalarization and if so, extend *LAST_SIBLING, create additional

> +   artificial accesses under PARENT or both to fill it.  Set *LAST_SIBLING to

> +   the last created access and set its next_sibling to NEXT_CHILD.  */

> +

> +static void

> +total_scalarization_fill_padding (struct access *parent,

> +				  struct access **last_sibling,

> +				  HOST_WIDE_INT from, HOST_WIDE_INT to,

> +				  struct access *next_child)

> +{

> +  if (constant_decl_p (parent->base))

> +    return;

> +

> +  gcc_assert (from <= to);

> +  if (from == to)

> +    return;

> +

> +  if (struct access *ls = *last_sibling)

> +    {

> +      /* First, perform any upgrades to full integers we would have done

> +	 anyway.  */

> +      upgrade_integral_size_to_prec (ls);

> +

> +      HOST_WIDE_INT lastsize = ls->size;

> +      if (INTEGRAL_TYPE_P (ls->type)

> +	  && pow2_or_zerop (lastsize))

> +	{

> +	  /* The last field was a reaonably sized integer, let's try to

> +	     enlarge as much as we can so that it is still naturally

> +	     aligned.  */

> +	  HOST_WIDE_INT lastpos = ls->offset;

> +	  HOST_WIDE_INT blocksize = lastsize;

> +	  tree type = NULL_TREE;

> +	  int unsignedp = TYPE_UNSIGNED (ls->type);

> +

> +	  while (true)

> +	    {

> +	      HOST_WIDE_INT b2 = blocksize * 2;

> +	      if (lastpos + b2 > to

> +		  || (lastpos % b2) != 0)

> +		break;

> +	      tree t2 = sra_type_for_size (b2, unsignedp);

> +	      if (!t2)

> +		break;

> +

> +	      blocksize = b2;

> +	      type = t2;

> +	    }

> +

> +	  if (blocksize != lastsize)

> +	    {

> +	      ls->reg_acc_type = ls->type;

> +	      ls->type = type;

> +	      ls->size = blocksize;

> +	      from = lastpos + blocksize;

> +	    }

> +

> +	  if (from == to)

> +	    return;

> +	}

> +    }

> +

> +  struct access **last_ptr;

> +  if (*last_sibling)

> +    last_ptr = &(*last_sibling)->next_sibling;

> +  else

> +    last_ptr = &parent->first_child;

> +  *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr,

> +					      next_child);

> +}

> +

>  static bool totally_scalarize_subtree (struct access *root);

>  

>  /* Return true if INNER is either the same type as OUTER or if it is the type

> @@ -3073,10 +3282,17 @@ total_should_skip_creating_access (struct access *parent,

>  				   HOST_WIDE_INT size)

>  {

>    struct access *next_child;

> +  HOST_WIDE_INT covered;

>    if (!*last_seen_sibling)

> -    next_child = parent->first_child;

> +    {

> +      next_child = parent->first_child;

> +      covered = parent->offset;

> +    }

>    else

> -    next_child = (*last_seen_sibling)->next_sibling;

> +    {

> +      next_child = (*last_seen_sibling)->next_sibling;

> +      covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size;

> +    }

>  

>    /* First, traverse the chain of siblings until it points to an access with

>       offset at least equal to POS.  Check all skipped accesses whether they

> @@ -3085,10 +3301,16 @@ total_should_skip_creating_access (struct access *parent,

>      {

>        if (next_child->offset + next_child->size > pos)

>  	return TOTAL_FLD_FAILED;

> +      total_scalarization_fill_padding (parent, last_seen_sibling, covered,

> +					next_child->offset, next_child);

> +      covered = next_child->offset + next_child->size;

>        *last_seen_sibling = next_child;

>        next_child = next_child->next_sibling;

>      }

>  

> +  total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos,

> +				    next_child);

> +

>    /* Now check whether next_child has exactly the right POS and SIZE and if so,

>       whether it can represent what we need and can be totally scalarized

>       itself.  */

> @@ -3273,6 +3495,34 @@ totally_scalarize_subtree (struct access *root)

>      }

>  

>   out:

> +  /* Even though there is nothing in the type, there can still be accesses here

> +     in the IL.  */

> +

> +  HOST_WIDE_INT covered;

> +  struct access *next_child;

> +

> +  if (last_seen_sibling)

> +    {

> +      covered = last_seen_sibling->offset + last_seen_sibling->size;

> +      next_child = last_seen_sibling->next_sibling;

> +    }

> +  else

> +    {

> +      covered = root->offset;

> +      next_child = root->first_child;

> +    }

> +

> +  while (next_child)

> +    {

> +      total_scalarization_fill_padding (root, &last_seen_sibling, covered,

> +					next_child->offset, next_child);

> +      covered = next_child->offset + next_child->size;

> +      last_seen_sibling = next_child;

> +      next_child = next_child->next_sibling;

> +    }

> +

> +  total_scalarization_fill_padding (root, &last_seen_sibling, covered,

> +				    root->offset + root->size, NULL);

>    return true;

>  }

>  

> @@ -3655,7 +3905,6 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)

>  

>    if (access->grp_to_be_replaced)

>      {

> -      tree repl = get_access_replacement (access);

>        /* If we replace a non-register typed access simply use the original

>           access expression to extract the scalar component afterwards.

>  	 This happens if scalarizing a function return value or parameter

> @@ -3666,10 +3915,14 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)

>           be accessed as a different type too, potentially creating a need for

>           type conversion (see PR42196) and when scalarized unions are involved

>           in assembler statements (see PR42398).  */

> -      if (!useless_type_conversion_p (type, access->type))

> +      tree actual_type

> +	= access->reg_acc_type ? access->reg_acc_type : access->type;

> +

> +      if (!useless_type_conversion_p (type, actual_type))

>  	{

>  	  tree ref;

>  

> +	  tree repl = get_access_replacement (access);

>  	  ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);

>  

>  	  if (write)

> @@ -3696,7 +3949,21 @@ sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)

>  	    }

>  	}

>        else

> -	*expr = repl;

> +	{

> +	  gassign *conversion;

> +	  tree repl = get_reg_access_replacement (loc, access, write,

> +						  &conversion);

> +	  *expr = repl;

> +

> +	  if (conversion)

> +	    {

> +	      if (write)

> +		gsi_insert_after (gsi, conversion, GSI_NEW_STMT);

> +	      else

> +		gsi_insert_before (gsi, conversion, GSI_SAME_STMT);

> +	    }

> +	}

> +

>        sra_stats.exprs++;

>      }

>    else if (write && access->grp_to_be_debug_replaced)

> @@ -3806,7 +4073,8 @@ load_assign_lhs_subreplacements (struct access *lacc,

>  	  gassign *stmt;

>  	  tree rhs;

>  

> -	  racc = find_access_in_subtree (sad->top_racc, offset, lacc->size);

> +	  racc = find_access_in_subtree (sad->top_racc, offset, lacc->size,

> +					 false);

>  	  if (racc && racc->grp_to_be_replaced)

>  	    {

>  	      rhs = get_access_replacement (racc);

> @@ -3857,7 +4125,7 @@ load_assign_lhs_subreplacements (struct access *lacc,

>  	      tree drhs;

>  	      struct access *racc = find_access_in_subtree (sad->top_racc,

>  							    offset,

> -							    lacc->size);

> +							    lacc->size, false);

>  

>  	      if (racc && racc->grp_to_be_replaced)

>  		{

> @@ -4010,8 +4278,11 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)

>    loc = gimple_location (stmt);

>    if (lacc && lacc->grp_to_be_replaced)

>      {

> -      lhs = get_access_replacement (lacc);

> +      gassign *lhs_conversion = NULL;

> +      lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion);

>        gimple_assign_set_lhs (stmt, lhs);

> +      if (lhs_conversion)

> +	gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT);

>        modify_this_stmt = true;

>        if (lacc->grp_partial_lhs)

>  	force_gimple_rhs = true;

> @@ -4020,7 +4291,10 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)

>  

>    if (racc && racc->grp_to_be_replaced)

>      {

> -      rhs = get_access_replacement (racc);

> +      gassign *rhs_conversion = NULL;

> +      rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion);

> +      if (rhs_conversion)

> +	gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT);

>        modify_this_stmt = true;

>        if (racc->grp_partial_lhs)

>  	force_gimple_rhs = true;

> 


-- 
Richard Biener <rguenther@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Patch

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3541c6638f9..34c60e6f2a3 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,35 @@ 
+2020-01-26  Martin Jambor  <mjambor@suse.cz>
+
+	PR tree-optimization/92486
+	* tree-sra.c: Include langhooks.h
+	(struct access): New fields reg_size and reg_acc_type.
+	(dump_access): Print new fields.
+	(acc_size): New function.
+	(find_access_in_subtree): Use it, new parameter reg.
+	(get_var_base_offset_size_access): Pass true to
+	find_access_in_subtree.
+	(create_access_1): Initialize reg_size.
+	(create_artificial_child_access): Likewise.
+	(create_total_scalarization_access): Likewise.
+	(build_ref_for_model): Do not use model expr if reg_acc_size is
+	non-NULL.
+	(get_reg_access_replacement): New function.
+	(verify_sra_access_forest): Adjust verification for presence of
+	extended accesses covering padding.
+	(analyze_access_subtree): Undo extension over padding if total
+	scalarization failed, set grp_partial_lhs if we are going to introduce
+	a partial store to the new replacement, do not ignore holes when
+	totally scalarizing.
+	(sra_type_for_size): New function.
+	(total_scalarization_fill_padding): Likewise.
+	(total_should_skip_creating_access): Use it.
+	(totally_scalarize_subtree): Likewise.
+	(sra_modify_expr): Use get_reg_access_replacement instead of
+	get_access_replacement, adjust for reg_acc_type.
+	(sra_modify_assign): Likewise.
+	(load_assign_lhs_subreplacements): Pass false to
+	find_access_in_subtree.
+
 2020-01-27  Martin Liska  <mliska@suse.cz>
 
 	PR driver/91220
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 22a37dd1ab2..7ccb5098224 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@ 
+2020-01-24  Martin Jambor  <mjambor@suse.cz>
+
+	PR tree-optimization/92486
+	* gcc.dg/tree-ssa/pr92486.c: New test.
+
 2020-01-27  Martin Liska  <mliska@suse.cz>
 
 	PR target/93274
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
new file mode 100644
index 00000000000..77e84241eff
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr92486.c
@@ -0,0 +1,38 @@ 
+/* { dg-do run } */
+/* { dg-options "-O1" } */
+
+struct s {
+    char c;
+    int i;
+};
+
+__attribute__((noipa))
+void f(struct s *p, struct s *q)
+{
+    struct s w;
+
+    __builtin_memset(&w, 0, sizeof(struct s));
+    w = *q;
+
+    __builtin_memset(p, 0, sizeof(struct s));
+    *p = w;
+}
+
+int main()
+{
+    struct s x;
+    __builtin_memset(&x, 1, sizeof(struct s));
+
+    struct s y;
+    __builtin_memset(&y, 2, sizeof(struct s));
+
+    f(&y, &x);
+
+    for (unsigned char *p = (unsigned char *)&y;
+	 p < (unsigned char *)&y + sizeof(struct s);
+	 p++)
+      if (*p != 1)
+	__builtin_abort ();
+
+    return 0;
+}
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index ea8594db193..bda342ffdf9 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -99,6 +99,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "dbgcnt.h"
 #include "builtins.h"
 #include "tree-sra.h"
+#include "langhooks.h"
 
 
 /* Enumeration of all aggregate reductions we can do.  */
@@ -130,11 +131,16 @@  struct assign_link;
 
 struct access
 {
-  /* Values returned by  `get_ref_base_and_extent' for each component reference
-     If EXPR isn't a component reference  just set `BASE = EXPR', `OFFSET = 0',
-     `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
+  /* Offset, size and base are values returned by `get_ref_base_and_extent' for
+     each component reference If EXPR isn't a component reference just set
+     `BASE = EXPR', `OFFSET = 0', `SIZE = TREE_SIZE (TREE_TYPE (expr))'.  */
   HOST_WIDE_INT offset;
   HOST_WIDE_INT size;
+
+  /* If reg_acc_type is non-NULL, this is the size of the actual egister type
+     the access represented before it was extended to cover padding.  Otherwise
+     must be equal to size.  */
+  HOST_WIDE_INT reg_size;
   tree base;
 
   /* Expression.  It is context dependent so do not use it to create new
@@ -144,6 +150,12 @@  struct access
   /* Type.  */
   tree type;
 
+  /* If non-NULL, this is the type that should be actually extracted or
+     inserted from/to the replacement decl when replacing accesses to the
+     individual field itself (as opposed to accesses created as part of
+     replacing aggregate copies which should use TYPE).  */
+  tree reg_acc_type;
+
   /* The statement this access belongs to.  */
   gimple *stmt;
 
@@ -391,10 +403,17 @@  dump_access (FILE *f, struct access *access, bool grp)
   print_generic_expr (f, access->base);
   fprintf (f, "', offset = " HOST_WIDE_INT_PRINT_DEC, access->offset);
   fprintf (f, ", size = " HOST_WIDE_INT_PRINT_DEC, access->size);
+  if (access->reg_size != access->size)
+    fprintf (f, ", reg_size = " HOST_WIDE_INT_PRINT_DEC, access->reg_size);
   fprintf (f, ", expr = ");
   print_generic_expr (f, access->expr);
   fprintf (f, ", type = ");
   print_generic_expr (f, access->type);
+  if (access->reg_acc_type)
+    {
+      fprintf (f, ", reg_acc_type = ");
+      print_generic_expr (f, access->type);
+    }
   fprintf (f, ", reverse = %d", access->reverse);
   if (grp)
     fprintf (f, ", grp_read = %d, grp_write = %d, grp_assignment_read = %d, "
@@ -481,18 +500,27 @@  get_base_access_vector (tree base)
   return base_access_vec->get (base);
 }
 
+/* Return ACCESS's size or reg_size depending on REG.  */
+
+static HOST_WIDE_INT
+acc_size (struct access *access, bool reg)
+{
+  return reg ? access->reg_size : access->size;
+}
+
 /* Find an access with required OFFSET and SIZE in a subtree of accesses rooted
    in ACCESS.  Return NULL if it cannot be found.  */
 
 static struct access *
 find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
-			HOST_WIDE_INT size)
+			HOST_WIDE_INT size, bool reg)
 {
-  while (access && (access->offset != offset || access->size != size))
+  while (access && (access->offset != offset
+		    || acc_size (access, reg) != size))
     {
       struct access *child = access->first_child;
 
-      while (child && (child->offset + child->size <= offset))
+      while (child && (child->offset + acc_size (child, reg) <= offset))
 	child = child->next_sibling;
       access = child;
     }
@@ -503,7 +531,7 @@  find_access_in_subtree (struct access *access, HOST_WIDE_INT offset,
   if (access)
     while (access->first_child
 	   && access->first_child->offset == offset
-	   && access->first_child->size == size)
+	   && acc_size (access->first_child, reg) == size)
       access = access->first_child;
 
   return access;
@@ -539,7 +567,7 @@  get_var_base_offset_size_access (tree base, HOST_WIDE_INT offset,
   if (!access)
     return NULL;
 
-  return find_access_in_subtree (access, offset, size);
+  return find_access_in_subtree (access, offset, size, true);
 }
 
 /* Add LINK to the linked list of assign links of RACC.  */
@@ -868,6 +896,7 @@  create_access_1 (tree base, HOST_WIDE_INT offset, HOST_WIDE_INT size)
   access->base = base;
   access->offset = offset;
   access->size = size;
+  access->reg_size = size;
 
   base_access_vec->get_or_insert (base).safe_push (access);
 
@@ -1667,6 +1696,7 @@  build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
     {
       tree res;
       if (model->grp_same_access_path
+	  && !model->reg_acc_type
 	  && !TREE_THIS_VOLATILE (base)
 	  && (TYPE_ADDR_SPACE (TREE_TYPE (base))
 	      == TYPE_ADDR_SPACE (TREE_TYPE (model->expr)))
@@ -2256,6 +2286,30 @@  get_access_replacement (struct access *access)
   return access->replacement_decl;
 }
 
+/* Like above except in cases when ACCESS has non-NULL reg_acc_type, in which
+   case also construct BIT_FIELD_REF into the replacement of the corresponding
+   type (with location LOC).  */
+
+static tree
+get_reg_access_replacement (location_t loc, struct access *access, bool write,
+			    gassign **conversion)
+{
+  tree repl = get_access_replacement (access);
+  if (!access->reg_acc_type)
+    {
+      *conversion = NULL;
+      return repl;
+    }
+
+  tree tmp = make_ssa_name (access->reg_acc_type);
+  if (write)
+    *conversion = gimple_build_assign (repl, NOP_EXPR, tmp);
+  else
+    *conversion = gimple_build_assign (tmp, NOP_EXPR, repl);
+  gimple_set_location (*conversion, loc);
+  return tmp;
+}
+
 
 /* Build a subtree of accesses rooted in *ACCESS, and move the pointer in the
    linked list along the way.  Stop when *ACCESS is NULL or the access pointed
@@ -2339,7 +2393,12 @@  verify_sra_access_forest (struct access *root)
       gcc_assert (offset == access->offset);
       gcc_assert (access->grp_unscalarizable_region
 		  || size == max_size);
-      gcc_assert (max_size == access->size);
+      if (access->reg_acc_type)
+	gcc_assert (max_size == access->reg_size
+		    && max_size < access->size);
+      else
+	gcc_assert (max_size == access->size
+		    && max_size == access->reg_size);
       gcc_assert (reverse == access->reverse);
 
       if (access->first_child)
@@ -2405,6 +2464,39 @@  expr_with_var_bounded_array_refs_p (tree expr)
   return false;
 }
 
+static void
+upgrade_integral_size_to_prec (struct access *access)
+{
+  /* Always create access replacements that cover the whole access.
+     For integral types this means the precision has to match.
+     Avoid assumptions based on the integral type kind, too.  */
+  if (INTEGRAL_TYPE_P (access->type)
+      && (TREE_CODE (access->type) != INTEGER_TYPE
+	  || TYPE_PRECISION (access->type) != access->size)
+      /* But leave bitfield accesses alone.  */
+      && (TREE_CODE (access->expr) != COMPONENT_REF
+	  || !DECL_BIT_FIELD (TREE_OPERAND (access->expr, 1))))
+    {
+      tree rt = access->type;
+      gcc_assert ((access->offset % BITS_PER_UNIT) == 0
+		  && (access->size % BITS_PER_UNIT) == 0);
+      access->type = build_nonstandard_integer_type (access->size,
+						   TYPE_UNSIGNED (rt));
+      access->expr = build_ref_for_offset (UNKNOWN_LOCATION, access->base,
+					 access->offset, access->reverse,
+					 access->type, NULL, false);
+
+      if (dump_file && (dump_flags & TDF_DETAILS))
+	{
+	  fprintf (dump_file, "Changing the type of a replacement for ");
+	  print_generic_expr (dump_file, access->base);
+	  fprintf (dump_file, " offset: %u, size: %u ",
+		   (unsigned) access->offset, (unsigned) access->size);
+	  fprintf (dump_file, " to an integer.\n");
+	}
+    }
+}
+
 /* Analyze the subtree of accesses rooted in ROOT, scheduling replacements when
    both seeming beneficial and when ALLOW_REPLACEMENTS allows it.  If TOTALLY
    is set, we are totally scalarizing the aggregate.  Also set all sorts of
@@ -2488,6 +2580,15 @@  analyze_access_subtree (struct access *root, struct access *parent,
 	hole = true;
     }
 
+  if (!totally && root->reg_acc_type)
+    {
+      /* If total scalarization did not eventually succeed, let's undo any
+	 futile attempts to cover padding.  */
+      root->type = root->reg_acc_type;
+      root->size = root->reg_size;
+      root->reg_acc_type = NULL_TREE;
+    }
+
   if (allow_replacements && scalar && !root->first_child
       && (totally || !root->grp_total_scalarization)
       && (totally
@@ -2495,34 +2596,7 @@  analyze_access_subtree (struct access *root, struct access *parent,
 	  || ((root->grp_scalar_read || root->grp_assignment_read)
 	      && (root->grp_scalar_write || root->grp_assignment_write))))
     {
-      /* Always create access replacements that cover the whole access.
-         For integral types this means the precision has to match.
-	 Avoid assumptions based on the integral type kind, too.  */
-      if (INTEGRAL_TYPE_P (root->type)
-	  && (TREE_CODE (root->type) != INTEGER_TYPE
-	      || TYPE_PRECISION (root->type) != root->size)
-	  /* But leave bitfield accesses alone.  */
-	  && (TREE_CODE (root->expr) != COMPONENT_REF
-	      || !DECL_BIT_FIELD (TREE_OPERAND (root->expr, 1))))
-	{
-	  tree rt = root->type;
-	  gcc_assert ((root->offset % BITS_PER_UNIT) == 0
-		      && (root->size % BITS_PER_UNIT) == 0);
-	  root->type = build_nonstandard_integer_type (root->size,
-						       TYPE_UNSIGNED (rt));
-	  root->expr = build_ref_for_offset (UNKNOWN_LOCATION, root->base,
-					     root->offset, root->reverse,
-					     root->type, NULL, false);
-
-	  if (dump_file && (dump_flags & TDF_DETAILS))
-	    {
-	      fprintf (dump_file, "Changing the type of a replacement for ");
-	      print_generic_expr (dump_file, root->base);
-	      fprintf (dump_file, " offset: %u, size: %u ",
-		       (unsigned) root->offset, (unsigned) root->size);
-	      fprintf (dump_file, " to an integer.\n");
-	    }
-	}
+      upgrade_integral_size_to_prec (root);
 
       root->grp_to_be_replaced = 1;
       root->replacement_decl = create_access_replacement (root);
@@ -2554,7 +2628,7 @@  analyze_access_subtree (struct access *root, struct access *parent,
 	root->grp_total_scalarization = 0;
     }
 
-  if (!hole || totally)
+  if (!hole)
     root->grp_covered = 1;
   else if (root->grp_write || comes_initialized_p (root->base))
     root->grp_unscalarized_data = 1; /* not covered and written to */
@@ -2636,6 +2710,8 @@  create_artificial_child_access (struct access *parent, struct access *model,
   access->expr = expr;
   access->offset = new_offset;
   access->size = model->size;
+  gcc_assert (model->size == model->reg_size);
+  access->reg_size = model->reg_size;
   access->type = model->type;
   access->parent = parent;
   access->grp_read = set_grp_read;
@@ -2966,6 +3042,7 @@  create_total_scalarization_access (struct access *parent, HOST_WIDE_INT pos,
   access->base = parent->base;
   access->offset = pos;
   access->size = size;
+  access->reg_size = size;
   access->expr = expr;
   access->type = type;
   access->parent = parent;
@@ -3015,6 +3092,138 @@  create_total_access_and_reshape (struct access *parent, HOST_WIDE_INT pos,
   return new_acc;
 }
 
+/* Given SIZE return the integer type to represent that many bits or NULL if
+   there is no suitable one.  */
+
+static tree
+sra_type_for_size (HOST_WIDE_INT size, int unsignedp = 1)
+{
+  tree type = lang_hooks.types.type_for_size (size, unsignedp);
+  scalar_int_mode mode;
+  if (type
+      && is_a <scalar_int_mode> (TYPE_MODE (type), &mode)
+      && GET_MODE_BITSIZE (mode) == size)
+    return type;
+  else
+    return NULL_TREE;
+}
+
+/* Create a series of accesses to cover padding from FROM to TO anch chain them
+   between LAST_PTR and NEXT_CHILD.  Return the pointer to the last created
+   one.  */
+
+static struct access *
+fill_padding_with_accesses (struct access *parent, HOST_WIDE_INT from,
+			    HOST_WIDE_INT to, struct access **last_ptr,
+			    struct access *next_child)
+{
+  struct access *access = NULL;
+
+  gcc_assert (from < to);
+  do
+    {
+      HOST_WIDE_INT diff = to - from;
+      gcc_assert (diff >= BITS_PER_UNIT);
+      HOST_WIDE_INT stsz = 1 << floor_log2 (diff);
+      tree type;
+
+      while (true)
+	{
+	  type = sra_type_for_size (stsz);
+	  if (type)
+	    break;
+	  stsz /= 2;
+	  gcc_checking_assert (stsz >= BITS_PER_UNIT);
+	}
+
+      do {
+	tree expr = build_ref_for_offset (UNKNOWN_LOCATION, parent->base,
+					  from, parent->reverse, type, NULL,
+					  false);
+	access = create_total_scalarization_access (parent, from, stsz, type,
+						    expr, last_ptr, next_child);
+	access->grp_no_warning = 1;
+	from += stsz;
+      }
+      while (to - from >= stsz);
+      gcc_assert (from <= to);
+    }
+  while (from < to);
+  return access;
+}
+
+/* Check whether any padding between FROM and TO must be filled during
+   total scalarization and if so, extend *LAST_SIBLING, create additional
+   artificial accesses under PARENT or both to fill it.  Set *LAST_SIBLING to
+   the last created access and set its next_sibling to NEXT_CHILD.  */
+
+static void
+total_scalarization_fill_padding (struct access *parent,
+				  struct access **last_sibling,
+				  HOST_WIDE_INT from, HOST_WIDE_INT to,
+				  struct access *next_child)
+{
+  if (constant_decl_p (parent->base))
+    return;
+
+  gcc_assert (from <= to);
+  if (from == to)
+    return;
+
+  if (struct access *ls = *last_sibling)
+    {
+      /* First, perform any upgrades to full integers we would have done
+	 anyway.  */
+      upgrade_integral_size_to_prec (ls);
+
+      HOST_WIDE_INT lastsize = ls->size;
+      if (INTEGRAL_TYPE_P (ls->type)
+	  && pow2_or_zerop (lastsize))
+	{
+	  /* The last field was a reaonably sized integer, let's try to
+	     enlarge as much as we can so that it is still naturally
+	     aligned.  */
+	  HOST_WIDE_INT lastpos = ls->offset;
+	  HOST_WIDE_INT blocksize = lastsize;
+	  tree type = NULL_TREE;
+	  int unsignedp = TYPE_UNSIGNED (ls->type);
+
+	  while (true)
+	    {
+	      HOST_WIDE_INT b2 = blocksize * 2;
+	      if (lastpos + b2 > to
+		  || (lastpos % b2) != 0)
+		break;
+	      tree t2 = sra_type_for_size (b2, unsignedp);
+	      if (!t2)
+		break;
+
+	      blocksize = b2;
+	      type = t2;
+	    }
+
+	  if (blocksize != lastsize)
+	    {
+	      ls->reg_acc_type = ls->type;
+	      ls->type = type;
+	      ls->size = blocksize;
+	      from = lastpos + blocksize;
+	    }
+
+	  if (from == to)
+	    return;
+	}
+    }
+
+  struct access **last_ptr;
+  if (*last_sibling)
+    last_ptr = &(*last_sibling)->next_sibling;
+  else
+    last_ptr = &parent->first_child;
+  *last_sibling = fill_padding_with_accesses (parent, from, to, last_ptr,
+					      next_child);
+}
+
 static bool totally_scalarize_subtree (struct access *root);
 
 /* Return true if INNER is either the same type as OUTER or if it is the type
@@ -3073,10 +3282,17 @@  total_should_skip_creating_access (struct access *parent,
 				   HOST_WIDE_INT size)
 {
   struct access *next_child;
+  HOST_WIDE_INT covered;
   if (!*last_seen_sibling)
-    next_child = parent->first_child;
+    {
+      next_child = parent->first_child;
+      covered = parent->offset;
+    }
   else
-    next_child = (*last_seen_sibling)->next_sibling;
+    {
+      next_child = (*last_seen_sibling)->next_sibling;
+      covered = (*last_seen_sibling)->offset + (*last_seen_sibling)->size;
+    }
 
   /* First, traverse the chain of siblings until it points to an access with
      offset at least equal to POS.  Check all skipped accesses whether they
@@ -3085,10 +3301,16 @@  total_should_skip_creating_access (struct access *parent,
     {
       if (next_child->offset + next_child->size > pos)
 	return TOTAL_FLD_FAILED;
+      total_scalarization_fill_padding (parent, last_seen_sibling, covered,
+					next_child->offset, next_child);
+      covered = next_child->offset + next_child->size;
       *last_seen_sibling = next_child;
       next_child = next_child->next_sibling;
     }
 
+  total_scalarization_fill_padding (parent, last_seen_sibling, covered, pos,
+				    next_child);
+
   /* Now check whether next_child has exactly the right POS and SIZE and if so,
      whether it can represent what we need and can be totally scalarized
      itself.  */
@@ -3273,6 +3495,34 @@  totally_scalarize_subtree (struct access *root)
     }
 
  out:
+  /* Even though there is nothing in the type, there can still be accesses here
+     in the IL.  */
+
+  HOST_WIDE_INT covered;
+  struct access *next_child;
+
+  if (last_seen_sibling)
+    {
+      covered = last_seen_sibling->offset + last_seen_sibling->size;
+      next_child = last_seen_sibling->next_sibling;
+    }
+  else
+    {
+      covered = root->offset;
+      next_child = root->first_child;
+    }
+
+  while (next_child)
+    {
+      total_scalarization_fill_padding (root, &last_seen_sibling, covered,
+					next_child->offset, next_child);
+      covered = next_child->offset + next_child->size;
+      last_seen_sibling = next_child;
+      next_child = next_child->next_sibling;
+    }
+
+  total_scalarization_fill_padding (root, &last_seen_sibling, covered,
+				    root->offset + root->size, NULL);
   return true;
 }
 
@@ -3655,7 +3905,6 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 
   if (access->grp_to_be_replaced)
     {
-      tree repl = get_access_replacement (access);
       /* If we replace a non-register typed access simply use the original
          access expression to extract the scalar component afterwards.
 	 This happens if scalarizing a function return value or parameter
@@ -3666,10 +3915,14 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
          be accessed as a different type too, potentially creating a need for
          type conversion (see PR42196) and when scalarized unions are involved
          in assembler statements (see PR42398).  */
-      if (!useless_type_conversion_p (type, access->type))
+      tree actual_type
+	= access->reg_acc_type ? access->reg_acc_type : access->type;
+
+      if (!useless_type_conversion_p (type, actual_type))
 	{
 	  tree ref;
 
+	  tree repl = get_access_replacement (access);
 	  ref = build_ref_for_model (loc, orig_expr, 0, access, gsi, false);
 
 	  if (write)
@@ -3696,7 +3949,21 @@  sra_modify_expr (tree *expr, gimple_stmt_iterator *gsi, bool write)
 	    }
 	}
       else
-	*expr = repl;
+	{
+	  gassign *conversion;
+	  tree repl = get_reg_access_replacement (loc, access, write,
+						  &conversion);
+	  *expr = repl;
+
+	  if (conversion)
+	    {
+	      if (write)
+		gsi_insert_after (gsi, conversion, GSI_NEW_STMT);
+	      else
+		gsi_insert_before (gsi, conversion, GSI_SAME_STMT);
+	    }
+	}
+
       sra_stats.exprs++;
     }
   else if (write && access->grp_to_be_debug_replaced)
@@ -3806,7 +4073,8 @@  load_assign_lhs_subreplacements (struct access *lacc,
 	  gassign *stmt;
 	  tree rhs;
 
-	  racc = find_access_in_subtree (sad->top_racc, offset, lacc->size);
+	  racc = find_access_in_subtree (sad->top_racc, offset, lacc->size,
+					 false);
 	  if (racc && racc->grp_to_be_replaced)
 	    {
 	      rhs = get_access_replacement (racc);
@@ -3857,7 +4125,7 @@  load_assign_lhs_subreplacements (struct access *lacc,
 	      tree drhs;
 	      struct access *racc = find_access_in_subtree (sad->top_racc,
 							    offset,
-							    lacc->size);
+							    lacc->size, false);
 
 	      if (racc && racc->grp_to_be_replaced)
 		{
@@ -4010,8 +4278,11 @@  sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
   loc = gimple_location (stmt);
   if (lacc && lacc->grp_to_be_replaced)
     {
-      lhs = get_access_replacement (lacc);
+      gassign *lhs_conversion = NULL;
+      lhs = get_reg_access_replacement (loc, lacc, true, &lhs_conversion);
       gimple_assign_set_lhs (stmt, lhs);
+      if (lhs_conversion)
+	gsi_insert_after (gsi, lhs_conversion, GSI_NEW_STMT);
       modify_this_stmt = true;
       if (lacc->grp_partial_lhs)
 	force_gimple_rhs = true;
@@ -4020,7 +4291,10 @@  sra_modify_assign (gimple *stmt, gimple_stmt_iterator *gsi)
 
   if (racc && racc->grp_to_be_replaced)
     {
-      rhs = get_access_replacement (racc);
+      gassign *rhs_conversion = NULL;
+      rhs = get_reg_access_replacement (loc, racc, false, &rhs_conversion);
+      if (rhs_conversion)
+	gsi_insert_before (&orig_gsi, rhs_conversion, GSI_SAME_STMT);
       modify_this_stmt = true;
       if (racc->grp_partial_lhs)
 	force_gimple_rhs = true;