[RFC,vectorizer] Fix ICE with masked vectors

Message ID c9f7a19a-d3c4-46e2-2cb0-dc6aadd47077@codesourcery.com
State New
Headers show
Series
  • [RFC,vectorizer] Fix ICE with masked vectors
Related show

Commit Message

Andrew Stubbs Dec. 9, 2019, 3:21 p.m.
Hi,

This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c:

during GIMPLE pass: vect
dump file: vect-ctor-1.c.159t.vect
.../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16':
.../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in 
exact_div, at poly-int.h:2162
0xdf845f poly_int<1u, poly_result<unsigned long, if_nonpoly<unsigned 
long, unsigned long, poly_int_traits<unsigned long>::is_poly>::type, 
poly_coeff_pair_traits<unsigned long, if_nonpoly<unsigned long, unsigned 
long, poly_int_traits<unsigned 
long>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, 
unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long)
         /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162
0xdf649a poly_int<1u, poly_result<unsigned long, unsigned long, 
poly_coeff_pair_traits<unsigned long, unsigned 
long>::result_kind>::type> exact_div<1u, unsigned long, unsigned 
long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned 
long> const&)
         /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175
0x1c473cd vect_get_num_vectors
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520
0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*)
 
/scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798
0x1596732 vect_analyze_loop_2
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095
0x15980f3 vect_analyze_loop(loop*, vec_info_shared*)
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536
0x15d7b36 try_vectorize_loop_1
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892
0x15d831f try_vectorize_loop
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044
0x15d84f9 vectorize_loops()
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125
0x144f0af execute
         /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414
Please submit a full bug report,
with preprocessed source if appropriate.
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.


The problem is that exact_div is being asked to do "8 / 64", which it 
won't. The comment on the function says "NUNITS should be based on the 
vectorization factor, so it is always a known multiple of the number of 
elements in VECTYPE". This is on the amdgcn target where the 
vectorization factor is always 64, but smaller tasks can be vectorized 
using masking.

I think what's happening here is that the assumption described in the 
comment is invalid in the presence of masked vectors.

The attached patch fixes the ICE in the testcase, but I suspect does not 
go far enough. Can it happen that NUNITS can be greater than the 
vectorization factor, but not a multiple? Is this even a valid fix in 
the first place? Must it be conditionalized on masking being available? 
Is the exactness even worth checking, in the presence of exceptions?

Thanks

Andrew

Comments

Richard Sandiford Dec. 9, 2019, 3:59 p.m. | #1
Andrew Stubbs <ams@codesourcery.com> writes:
> Hi,

>

> This patch fixes an ICE in testcase gcc.dg/vect/vect-ctor-1.c:

>

> during GIMPLE pass: vect

> dump file: vect-ctor-1.c.159t.vect

> .../gcc.dg/vect/vect-ctor-1.c: In function 'intrapred_luma_16x16':

> .../gcc.dg/vect/vect-ctor-1.c:9:6: internal compiler error: in 

> exact_div, at poly-int.h:2162

> 0xdf845f poly_int<1u, poly_result<unsigned long, if_nonpoly<unsigned 

> long, unsigned long, poly_int_traits<unsigned long>::is_poly>::type, 

> poly_coeff_pair_traits<unsigned long, if_nonpoly<unsigned long, unsigned 

> long, poly_int_traits<unsigned 

> long>::is_poly>::type>::result_kind>::type> exact_div<1u, unsigned long, 

> unsigned long>(poly_int_pod<1u, unsigned long> const&, unsigned long)

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2162

> 0xdf649a poly_int<1u, poly_result<unsigned long, unsigned long, 

> poly_coeff_pair_traits<unsigned long, unsigned 

> long>::result_kind>::type> exact_div<1u, unsigned long, unsigned 

> long>(poly_int_pod<1u, unsigned long> const&, poly_int_pod<1u, unsigned 

> long> const&)

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/poly-int.h:2175

> 0x1c473cd vect_get_num_vectors

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.h:1520

> 0x1c4bd35 vect_enhance_data_refs_alignment(_loop_vec_info*)

>  

> /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-data-refs.c:1798

> 0x1596732 vect_analyze_loop_2

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2095

> 0x15980f3 vect_analyze_loop(loop*, vec_info_shared*)

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vect-loop.c:2536

> 0x15d7b36 try_vectorize_loop_1

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:892

> 0x15d831f try_vectorize_loop

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1044

> 0x15d84f9 vectorize_loops()

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-vectorizer.c:1125

> 0x144f0af execute

>          /scratch/astubbs/amd/src/gcc-mainline/gcc/tree-ssa-loop.c:414

> Please submit a full bug report,

> with preprocessed source if appropriate.

> Please include the complete backtrace with any bug report.

> See <https://gcc.gnu.org/bugs/> for instructions.

>

>

> The problem is that exact_div is being asked to do "8 / 64", which it 

> won't. The comment on the function says "NUNITS should be based on the 

> vectorization factor, so it is always a known multiple of the number of 

> elements in VECTYPE". This is on the amdgcn target where the 

> vectorization factor is always 64, but smaller tasks can be vectorized 

> using masking.

>

> I think what's happening here is that the assumption described in the 

> comment is invalid in the presence of masked vectors.


No, the assumption's correct even there.  The assert usually triggers
because something elsewhere is getting confused about the vector types.

> The attached patch fixes the ICE in the testcase, but I suspect does not 

> go far enough. Can it happen that NUNITS can be greater than the 

> vectorization factor, but not a multiple? Is this even a valid fix in 

> the first place? Must it be conditionalized on masking being available? 

> Is the exactness even worth checking, in the presence of exceptions?


The vector types and VF aren't chosen based on whether masking is available.
It happens the other way around: we first analyse the loop and pick the VF
for an unmasked loop, but record as we go whether a masked implementation
is also possible.  Then we decide at the end whether to use a masked
implementation instead of an unmasked one.

So if this assert triggers for masked loops, it could trigger for unmasked
loops too.

FWIW there's an instance of this for SVE that I haven't got around
to debugging yet, but from a quick look at the dump, it was somehow
combining a vector of 8 longs with a vector of 4 floats.  I'm not sure
it's going to be the same issue as yours though.

Thanks,
Richard

>

> Thanks

>

> Andrew

>

> WIP Fix vect-ctor-1.c ICE

>

>

> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h

> index 51a13f1d207..bf1c3eeda85 100644

> --- a/gcc/tree-vectorizer.h

> +++ b/gcc/tree-vectorizer.h

> @@ -1513,6 +1513,10 @@ vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo)

>  static inline unsigned int

>  vect_get_num_vectors (poly_uint64 nunits, tree vectype)

>  {

> +  /* Masked vectors can cause partial vector use.  */

> +  if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype)))

> +    return 1;

> +

>    return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();

>  }

>
Andrew Stubbs Dec. 10, 2019, 1:27 p.m. | #2
On 09/12/2019 15:59, Richard Sandiford wrote:
> No, the assumption's correct even there.  The assert usually triggers

> because something elsewhere is getting confused about the vector types.

> 

>> The attached patch fixes the ICE in the testcase, but I suspect does not

>> go far enough. Can it happen that NUNITS can be greater than the

>> vectorization factor, but not a multiple? Is this even a valid fix in

>> the first place? Must it be conditionalized on masking being available?

>> Is the exactness even worth checking, in the presence of exceptions?

> 

> The vector types and VF aren't chosen based on whether masking is available.

> It happens the other way around: we first analyse the loop and pick the VF

> for an unmasked loop, but record as we go whether a masked implementation

> is also possible.  Then we decide at the end whether to use a masked

> implementation instead of an unmasked one.

> 

> So if this assert triggers for masked loops, it could trigger for unmasked

> loops too.


OK, I completely misunderstood what was happening here.

What happens is that it goes through and finds vector types for every 
statement, and then says "Updating vectorization factor to 4.", but 
doesn't then go back and check for suitable types.

So, then it gets to this:

      if (unlimited_cost_model (LOOP_VINFO_LOOP (loop_vinfo)))
        {
          poly_uint64 nscalars = (STMT_SLP_TYPE (stmt_info)
                                  ? vf * DR_GROUP_SIZE (stmt_info) : vf);
          possible_npeel_number
            = vect_get_num_vectors (nscalars, vectype);

          /* NPEEL_TMP is 0 when there is no misalignment, but also
             allow peeling NELEMENTS.  */
          if (DR_MISALIGNMENT (dr_info) == 0)
            possible_npeel_number++;
        }

where "vf" is now 4, the group size appears to be 2, and "vectype" is 
V64SI, and vect_get_num_vectors blows up trying to divide 8 by 64.

If I switch back to the default cost model then the "vect" pass 
completes successfully, although vectorization fails due to a missing 
vector operator. The following "slp2" pass then switches to TImode fake 
vectors and works fine.

Alternatively, if I add back my patch then the pass completes the same 
way (without vectorization).

Any suggestions? I can't see how this stuff is supposed to work?

Andrew

Patch

WIP Fix vect-ctor-1.c ICE


diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
index 51a13f1d207..bf1c3eeda85 100644
--- a/gcc/tree-vectorizer.h
+++ b/gcc/tree-vectorizer.h
@@ -1513,6 +1513,10 @@  vect_use_loop_mask_for_alignment_p (loop_vec_info loop_vinfo)
 static inline unsigned int
 vect_get_num_vectors (poly_uint64 nunits, tree vectype)
 {
+  /* Masked vectors can cause partial vector use.  */
+  if (known_lt (nunits, TYPE_VECTOR_SUBPARTS (vectype)))
+    return 1;
+
   return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
 }