[PR85804] Fix wrong code by correcting bump step computation in vector(1) load of single-element group access

Message ID DB6PR0802MB2504F8D3210C3BD88374FE89E7950@DB6PR0802MB2504.eurprd08.prod.outlook.com
State New
Headers show
Series
  • [PR85804] Fix wrong code by correcting bump step computation in vector(1) load of single-element group access
Related show

Commit Message

Bin Cheng May 21, 2018, 1:14 p.m.
Hi,
As reported in PR85804, bump step is wrongly computed for vector(1) load of
single-element group access.  This patch fixes the issue by correcting bump
step computation for the specific VMAT_CONTIGUOUS case.

Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

Thanks,
bin

2018-05-17  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/85804
	* tree-vect-stmts.c (vectorizable_load): Compute correct bump step
	for vector(1) load in single-element group access.

gcc/testsuite
2018-05-17  Bin Cheng  <bin.cheng@arm.com>

	PR tree-optimization/85804
	* gcc.c-torture/execute/pr85804.c: New test.

Comments

Richard Biener May 22, 2018, 9:18 a.m. | #1
On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

> Hi,

> As reported in PR85804, bump step is wrongly computed for vector(1) load

of
> single-element group access.  This patch fixes the issue by correcting

bump
> step computation for the specific VMAT_CONTIGUOUS case.


> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?


To me it looks like the classification as VMAT_CONTIGUOUS is bogus.
We'd fall into the grouped_load case otherwise which should handle
the situation correctly?

Richard?

Thanks,
Richard.


> Thanks,

> bin


> 2018-05-17  Bin Cheng  <bin.cheng@arm.com>


>          PR tree-optimization/85804

>          * tree-vect-stmts.c (vectorizable_load): Compute correct bump step

>          for vector(1) load in single-element group access.


> gcc/testsuite

> 2018-05-17  Bin Cheng  <bin.cheng@arm.com>


>          PR tree-optimization/85804

>          * gcc.c-torture/execute/pr85804.c: New test.
Richard Sandiford May 22, 2018, 12:11 p.m. | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

>

>> Hi,

>> As reported in PR85804, bump step is wrongly computed for vector(1) load

> of

>> single-element group access.  This patch fixes the issue by correcting

> bump

>> step computation for the specific VMAT_CONTIGUOUS case.

>

>> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

>

> To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

> We'd fall into the grouped_load case otherwise which should handle

> the situation correctly?

>

> Richard?


Yeah, I agree.  I mentioned to Bin privately that that was probably
a misstep and that we should instead continue to treat them as
VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector
from the array of loaded vectors, instead of doing an actual permute.

(Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow
gaps there.  But it might be easiest to handle both loads and stores
in the same way.)

Although it still seems weird to "vectorise" stuff to one element.
Why not leave the original scalar code in place, and put the onus on
whatever wants to produce or consume a V1 to do the appropriate
conversion?

Thanks,
Richard
Richard Biener May 23, 2018, 10:19 a.m. | #3
On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> Richard Biener <richard.guenther@gmail.com> writes:

> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

> >

> >> Hi,

> >> As reported in PR85804, bump step is wrongly computed for vector(1)

load
> > of

> >> single-element group access.  This patch fixes the issue by correcting

> > bump

> >> step computation for the specific VMAT_CONTIGUOUS case.

> >

> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

> >

> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

> > We'd fall into the grouped_load case otherwise which should handle

> > the situation correctly?

> >

> > Richard?


> Yeah, I agree.  I mentioned to Bin privately that that was probably

> a misstep and that we should instead continue to treat them as

> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

> from the array of loaded vectors, instead of doing an actual permute.


I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS
should be only for no-gap vectors.  How do we classify single-element
interleaving?  That would be another classification choice.

> (Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow

> gaps there.  But it might be easiest to handle both loads and stores

> in the same way.)


> Although it still seems weird to "vectorise" stuff to one element.

> Why not leave the original scalar code in place, and put the onus on

> whatever wants to produce or consume a V1 to do the appropriate

> conversion?


Yeah, V1 is somewhat awkward to deal with but I think the way we're
doing it right now is OK.  Note if we leave the original scalar code in
place we still have to deal with larger unrolling factors thus we'd have
to duplicate the scalar code and have different IVs anyways.

Richard.

> Thanks,

> Richard
Bin.Cheng May 23, 2018, 10:52 a.m. | #4
On Wed, May 23, 2018 at 11:19 AM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <

> richard.sandiford@linaro.org> wrote:

>

>> Richard Biener <richard.guenther@gmail.com> writes:

>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

>> >

>> >> Hi,

>> >> As reported in PR85804, bump step is wrongly computed for vector(1)

> load

>> > of

>> >> single-element group access.  This patch fixes the issue by correcting

>> > bump

>> >> step computation for the specific VMAT_CONTIGUOUS case.

>> >

>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

>> >

>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

>> > We'd fall into the grouped_load case otherwise which should handle

>> > the situation correctly?

>> >

>> > Richard?

>

>> Yeah, I agree.  I mentioned to Bin privately that that was probably

>> a misstep and that we should instead continue to treat them as

>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

>> from the array of loaded vectors, instead of doing an actual permute.

>

> I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS

> should be only for no-gap vectors.  How do we classify single-element

> interleaving?  That would be another classification choice.

Yes, I suggested this offline too, but Richard may have more to say about this.
One thing worth noting is classifying it as VMAT_ELEMENTWISE would
disable vectorization in this case because of cost model issue as
commented at the end of get_load_store_type.

Thanks,
bin
>

>> (Note that VMAT_CONTIGUOUS is OK for stores, since we don't allow

>> gaps there.  But it might be easiest to handle both loads and stores

>> in the same way.)

>

>> Although it still seems weird to "vectorise" stuff to one element.

>> Why not leave the original scalar code in place, and put the onus on

>> whatever wants to produce or consume a V1 to do the appropriate

>> conversion?

>

> Yeah, V1 is somewhat awkward to deal with but I think the way we're

> doing it right now is OK.  Note if we leave the original scalar code in

> place we still have to deal with larger unrolling factors thus we'd have

> to duplicate the scalar code and have different IVs anyways.

>

> Richard.

>

>> Thanks,

>> Richard
Richard Sandiford May 23, 2018, 11:01 a.m. | #5
"Bin.Cheng" <amker.cheng@gmail.com> writes:
> On Wed, May 23, 2018 at 11:19 AM, Richard Biener

> <richard.guenther@gmail.com> wrote:

>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <

>> richard.sandiford@linaro.org> wrote:

>>

>>> Richard Biener <richard.guenther@gmail.com> writes:

>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

>>> >

>>> >> Hi,

>>> >> As reported in PR85804, bump step is wrongly computed for vector(1)

>> load

>>> > of

>>> >> single-element group access.  This patch fixes the issue by correcting

>>> > bump

>>> >> step computation for the specific VMAT_CONTIGUOUS case.

>>> >

>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

>>> >

>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

>>> > We'd fall into the grouped_load case otherwise which should handle

>>> > the situation correctly?

>>> >

>>> > Richard?

>>

>>> Yeah, I agree.  I mentioned to Bin privately that that was probably

>>> a misstep and that we should instead continue to treat them as

>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

>>> from the array of loaded vectors, instead of doing an actual permute.

>>

>> I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS

>> should be only for no-gap vectors.  How do we classify single-element

>> interleaving?  That would be another classification choice.

> Yes, I suggested this offline too, but Richard may have more to say about this.

> One thing worth noting is classifying it as VMAT_ELEMENTWISE would

> disable vectorization in this case because of cost model issue as

> commented at the end of get_load_store_type.


Yeah, that's the problem.  Using VMAT_ELEMENTWISE also means that we
use a scalar load and then insert it into a vector, whereas all we want
(and all we currently generate) is a single vector load.

So if we classify them as VMAT_ELEMENTWISE, they'll be a special case
for both costing (vector load rather than scalar load and vector
construct) and code-generation.

Thanks,
Richard
Bin.Cheng May 23, 2018, 11:10 a.m. | #6
On Wed, May 23, 2018 at 12:01 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> On Wed, May 23, 2018 at 11:19 AM, Richard Biener

>> <richard.guenther@gmail.com> wrote:

>>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <

>>> richard.sandiford@linaro.org> wrote:

>>>

>>>> Richard Biener <richard.guenther@gmail.com> writes:

>>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

>>>> >

>>>> >> Hi,

>>>> >> As reported in PR85804, bump step is wrongly computed for vector(1)

>>> load

>>>> > of

>>>> >> single-element group access.  This patch fixes the issue by correcting

>>>> > bump

>>>> >> step computation for the specific VMAT_CONTIGUOUS case.

>>>> >

>>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

>>>> >

>>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

>>>> > We'd fall into the grouped_load case otherwise which should handle

>>>> > the situation correctly?

>>>> >

>>>> > Richard?

>>>

>>>> Yeah, I agree.  I mentioned to Bin privately that that was probably

>>>> a misstep and that we should instead continue to treat them as

>>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

>>>> from the array of loaded vectors, instead of doing an actual permute.

>>>

>>> I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS

>>> should be only for no-gap vectors.  How do we classify single-element

>>> interleaving?  That would be another classification choice.

>> Yes, I suggested this offline too, but Richard may have more to say about this.

>> One thing worth noting is classifying it as VMAT_ELEMENTWISE would

>> disable vectorization in this case because of cost model issue as

>> commented at the end of get_load_store_type.

>

> Yeah, that's the problem.  Using VMAT_ELEMENTWISE also means that we

> use a scalar load and then insert it into a vector, whereas all we want

> (and all we currently generate) is a single vector load.

>

> So if we classify them as VMAT_ELEMENTWISE, they'll be a special case

> for both costing (vector load rather than scalar load and vector

> construct) and code-generation.

Looks to me it will be a special case for VMAT_ELEMENTWISE or
VMAT_CONTIGUOUS* anyway, probably VMAT_CONTIGUOUS is the easiest one?

Thanks,
bin
>

> Thanks,

> Richard
Richard Biener May 23, 2018, 11:11 a.m. | #7
On Wed, May 23, 2018 at 1:02 PM Richard Sandiford <
richard.sandiford@linaro.org> wrote:

> "Bin.Cheng" <amker.cheng@gmail.com> writes:

> > On Wed, May 23, 2018 at 11:19 AM, Richard Biener

> > <richard.guenther@gmail.com> wrote:

> >> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <

> >> richard.sandiford@linaro.org> wrote:

> >>

> >>> Richard Biener <richard.guenther@gmail.com> writes:

> >>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com> wrote:

> >>> >

> >>> >> Hi,

> >>> >> As reported in PR85804, bump step is wrongly computed for vector(1)

> >> load

> >>> > of

> >>> >> single-element group access.  This patch fixes the issue by

correcting
> >>> > bump

> >>> >> step computation for the specific VMAT_CONTIGUOUS case.

> >>> >

> >>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

> >>> >

> >>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

> >>> > We'd fall into the grouped_load case otherwise which should handle

> >>> > the situation correctly?

> >>> >

> >>> > Richard?

> >>

> >>> Yeah, I agree.  I mentioned to Bin privately that that was probably

> >>> a misstep and that we should instead continue to treat them as

> >>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

> >>> from the array of loaded vectors, instead of doing an actual permute.

> >>

> >> I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS

> >> should be only for no-gap vectors.  How do we classify single-element

> >> interleaving?  That would be another classification choice.

> > Yes, I suggested this offline too, but Richard may have more to say

about this.
> > One thing worth noting is classifying it as VMAT_ELEMENTWISE would

> > disable vectorization in this case because of cost model issue as

> > commented at the end of get_load_store_type.


> Yeah, that's the problem.  Using VMAT_ELEMENTWISE also means that we

> use a scalar load and then insert it into a vector, whereas all we want

> (and all we currently generate) is a single vector load.


> So if we classify them as VMAT_ELEMENTWISE, they'll be a special case

> for both costing (vector load rather than scalar load and vector

> construct) and code-generation.


But V1 elementwise loads already work by loading with the V1 vector type
(Bin fixed that recently).  But yes, the costmodel thing should be fixed
anyways.

Richard.

> Thanks,

> Richard
Richard Biener May 23, 2018, 11:12 a.m. | #8
On Wed, May 23, 2018 at 1:10 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

> On Wed, May 23, 2018 at 12:01 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

> > "Bin.Cheng" <amker.cheng@gmail.com> writes:

> >> On Wed, May 23, 2018 at 11:19 AM, Richard Biener

> >> <richard.guenther@gmail.com> wrote:

> >>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <

> >>> richard.sandiford@linaro.org> wrote:

> >>>

> >>>> Richard Biener <richard.guenther@gmail.com> writes:

> >>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com>

wrote:
> >>>> >

> >>>> >> Hi,

> >>>> >> As reported in PR85804, bump step is wrongly computed for

vector(1)
> >>> load

> >>>> > of

> >>>> >> single-element group access.  This patch fixes the issue by

correcting
> >>>> > bump

> >>>> >> step computation for the specific VMAT_CONTIGUOUS case.

> >>>> >

> >>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

> >>>> >

> >>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

> >>>> > We'd fall into the grouped_load case otherwise which should handle

> >>>> > the situation correctly?

> >>>> >

> >>>> > Richard?

> >>>

> >>>> Yeah, I agree.  I mentioned to Bin privately that that was probably

> >>>> a misstep and that we should instead continue to treat them as

> >>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

> >>>> from the array of loaded vectors, instead of doing an actual permute.

> >>>

> >>> I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS

> >>> should be only for no-gap vectors.  How do we classify single-element

> >>> interleaving?  That would be another classification choice.

> >> Yes, I suggested this offline too, but Richard may have more to say

about this.
> >> One thing worth noting is classifying it as VMAT_ELEMENTWISE would

> >> disable vectorization in this case because of cost model issue as

> >> commented at the end of get_load_store_type.

> >

> > Yeah, that's the problem.  Using VMAT_ELEMENTWISE also means that we

> > use a scalar load and then insert it into a vector, whereas all we want

> > (and all we currently generate) is a single vector load.

> >

> > So if we classify them as VMAT_ELEMENTWISE, they'll be a special case

> > for both costing (vector load rather than scalar load and vector

> > construct) and code-generation.

> Looks to me it will be a special case for VMAT_ELEMENTWISE or

> VMAT_CONTIGUOUS* anyway, probably VMAT_CONTIGUOUS is the easiest one?


The question is why we do

   if (memory_access_type == VMAT_GATHER_SCATTER
       || (!slp && memory_access_type == VMAT_CONTIGUOUS))
     grouped_load = false;

I think for the particular testcase removing this would fix the issue as
well?

Richard.

> Thanks,

> bin

> >

> > Thanks,

> > Richard
Bin.Cheng May 23, 2018, 11:53 a.m. | #9
On Wed, May 23, 2018 at 12:12 PM, Richard Biener
<richard.guenther@gmail.com> wrote:
> On Wed, May 23, 2018 at 1:10 PM Bin.Cheng <amker.cheng@gmail.com> wrote:

>

>> On Wed, May 23, 2018 at 12:01 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>> > "Bin.Cheng" <amker.cheng@gmail.com> writes:

>> >> On Wed, May 23, 2018 at 11:19 AM, Richard Biener

>> >> <richard.guenther@gmail.com> wrote:

>> >>> On Tue, May 22, 2018 at 2:11 PM Richard Sandiford <

>> >>> richard.sandiford@linaro.org> wrote:

>> >>>

>> >>>> Richard Biener <richard.guenther@gmail.com> writes:

>> >>>> > On Mon, May 21, 2018 at 3:14 PM Bin Cheng <Bin.Cheng@arm.com>

> wrote:

>> >>>> >

>> >>>> >> Hi,

>> >>>> >> As reported in PR85804, bump step is wrongly computed for

> vector(1)

>> >>> load

>> >>>> > of

>> >>>> >> single-element group access.  This patch fixes the issue by

> correcting

>> >>>> > bump

>> >>>> >> step computation for the specific VMAT_CONTIGUOUS case.

>> >>>> >

>> >>>> >> Bootstrap and test on x86_64 and AArch64 ongoing, is it OK?

>> >>>> >

>> >>>> > To me it looks like the classification as VMAT_CONTIGUOUS is bogus.

>> >>>> > We'd fall into the grouped_load case otherwise which should handle

>> >>>> > the situation correctly?

>> >>>> >

>> >>>> > Richard?

>> >>>

>> >>>> Yeah, I agree.  I mentioned to Bin privately that that was probably

>> >>>> a misstep and that we should instead continue to treat them as

>> >>>> VMAT_CONTIGUOUS_PERMUTE, but simply select the required vector

>> >>>> from the array of loaded vectors, instead of doing an actual permute.

>> >>>

>> >>> I'd classify them as VMAT_ELEMENTWISE instead.  CONTIGUOUS

>> >>> should be only for no-gap vectors.  How do we classify single-element

>> >>> interleaving?  That would be another classification choice.

>> >> Yes, I suggested this offline too, but Richard may have more to say

> about this.

>> >> One thing worth noting is classifying it as VMAT_ELEMENTWISE would

>> >> disable vectorization in this case because of cost model issue as

>> >> commented at the end of get_load_store_type.

>> >

>> > Yeah, that's the problem.  Using VMAT_ELEMENTWISE also means that we

>> > use a scalar load and then insert it into a vector, whereas all we want

>> > (and all we currently generate) is a single vector load.

>> >

>> > So if we classify them as VMAT_ELEMENTWISE, they'll be a special case

>> > for both costing (vector load rather than scalar load and vector

>> > construct) and code-generation.

>> Looks to me it will be a special case for VMAT_ELEMENTWISE or

>> VMAT_CONTIGUOUS* anyway, probably VMAT_CONTIGUOUS is the easiest one?

>

> The question is why we do

>

>    if (memory_access_type == VMAT_GATHER_SCATTER

>        || (!slp && memory_access_type == VMAT_CONTIGUOUS))

>      grouped_load = false;

>

> I think for the particular testcase removing this would fix the issue as

No, simply relaxing this check would result in generating redundant
loads (for gap).  Also it leads to vect_transform_grouped_load as well
as vect_permute_load_chain, so we would need to do the special case
handling just like classifying as VMAT_CONTIGUOUS_PERMUTE.

Thanks,
bin
> well?

>

> Richard.

>

>> Thanks,

>> bin

>> >

>> > Thanks,

>> > Richard

Patch

From 502bcd1e445186a56b6ea254a0cd2406fb62f08c Mon Sep 17 00:00:00 2001
From: Bin Cheng <binche01@e108451-lin.cambridge.arm.com>
Date: Fri, 18 May 2018 14:18:14 +0100
Subject: [PATCH] pr85804-20180517

---
 gcc/testsuite/gcc.c-torture/execute/pr85804.c | 22 ++++++++++++++++++++++
 gcc/tree-vect-stmts.c                         | 15 +++++++++++++++
 2 files changed, 37 insertions(+)
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/pr85804.c

diff --git a/gcc/testsuite/gcc.c-torture/execute/pr85804.c b/gcc/testsuite/gcc.c-torture/execute/pr85804.c
new file mode 100644
index 0000000..b8929b1
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr85804.c
@@ -0,0 +1,22 @@ 
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model" } */
+
+long d[64] = {0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
+              0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
+};
+
+void abort ();
+void __attribute__((noinline)) foo (int b)
+{
+  if (b)
+    abort ();
+}
+int main() {
+  int b = 0;
+  for (int c = 0; c <= 5; c++)
+    b ^= d[c * 5 + 1];
+  foo (b);
+  return 0;
+}
+
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 64a157d..e6b95b3 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -7262,6 +7262,7 @@  vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
   gphi *phi = NULL;
   vec<tree> dr_chain = vNULL;
   bool grouped_load = false;
+  bool single_element = false;
   gimple *first_stmt;
   gimple *first_stmt_for_drptr = NULL;
   bool inv_p;
@@ -7822,6 +7823,15 @@  vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       first_stmt = stmt;
       first_dr = dr;
       group_size = vec_num = 1;
+      /* For single-element vector in a single-element group, record the group
+	 size in order to compute correct bump size.  */
+      if (!slp
+	  && memory_access_type == VMAT_CONTIGUOUS
+	  && STMT_VINFO_GROUPED_ACCESS (stmt_info))
+	{
+	  single_element = true;
+	  group_size = GROUP_SIZE (vinfo_for_stmt (first_stmt));
+	}
       group_gap_adj = 0;
       ref_type = reference_alias_ptr_type (DR_REF (first_dr));
     }
@@ -7992,6 +8002,11 @@  vectorizable_load (gimple *stmt, gimple_stmt_iterator *gsi, gimple **vec_stmt,
       else
 	aggr_type = vectype;
       bump = vect_get_data_ptr_increment (dr, aggr_type, memory_access_type);
+      /* Multiply bump size by group size for single-element vector in single-
+	 element group.  */
+      if (single_element && group_size > 1)
+	bump = fold_build2 (MULT_EXPR, TREE_TYPE (bump), bump,
+			    build_int_cst (TREE_TYPE (bump), group_size));
     }
 
   tree vec_mask = NULL_TREE;
-- 
1.9.1