[stage1] Lower VEC_COND_EXPR into internal functions.

Message ID 0e5c0fa7-9719-e3ab-b909-f1c0b052b283@suse.cz
State New
Headers show
Series
  • [stage1] Lower VEC_COND_EXPR into internal functions.
Related show

Commit Message

Martin Liška April 1, 2020, 10:19 a.m.
Hello.

This is second attempt to get rid of tcc_comparison GENERIC trees
to be used as the first argument of VEC_COND_EXPR.

The patch attempts achieves that in the following steps:
1) veclower pass expands all tcc_comparison expression into a SSA_NAME
2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR
    (done in GIMPLE verifier)
3) I exposed new internal functions with:
DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass
5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)
6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when
    a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake
target and I don't see any reasonable change.

Achieved benefits of the patch:
- removal of a GENERIC expression being used in GIMPLE statements
- extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)
- possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)

Future plans:
- tcc_comparison removal just during gimplification
- removal of a code where these expressions are handled for VEC_COND_EXPR
- do the similar thing for COND_EXPR?

The task was guided by Richi (Biener) and I bet he can help with both further questions
and reasoning.

Thanks,
Martin

Comments

Richard Sandiford April 6, 2020, 9:17 a.m. | #1
Martin Liška <mliska@suse.cz> writes:
> Hello.

>

> This is second attempt to get rid of tcc_comparison GENERIC trees

> to be used as the first argument of VEC_COND_EXPR.

>

> The patch attempts achieves that in the following steps:

> 1) veclower pass expands all tcc_comparison expression into a SSA_NAME

> 2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR

>     (done in GIMPLE verifier)

> 3) I exposed new internal functions with:

> DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

> DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

> DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

> DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

>

> 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass

> 5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)

> 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when

>     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements

>

> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

> Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake

> target and I don't see any reasonable change.

>

> Achieved benefits of the patch:

> - removal of a GENERIC expression being used in GIMPLE statements

> - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)

> - possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)

>

> Future plans:

> - tcc_comparison removal just during gimplification

> - removal of a code where these expressions are handled for VEC_COND_EXPR

> - do the similar thing for COND_EXPR?

>

> The task was guided by Richi (Biener) and I bet he can help with both further questions

> and reasoning.


Thanks for doing this.  It definitely seems more friendly than the
four-operand version to targets where separate comparisons are the norm.

Just a couple of comments about the implementation:

> diff --git a/gcc/passes.def b/gcc/passes.def

> index 2bf2cb78fc5..d654e5ee9fe 100644

> --- a/gcc/passes.def

> +++ b/gcc/passes.def

> @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see

>    NEXT_PASS (pass_cleanup_eh);

>    NEXT_PASS (pass_lower_resx);

>    NEXT_PASS (pass_nrv);

> +  NEXT_PASS (pass_gimple_isel);

>    NEXT_PASS (pass_cleanup_cfg_post_optimizing);

>    NEXT_PASS (pass_warn_function_noreturn);

>    NEXT_PASS (pass_gen_hsail);


What was the reason for making this a separate pass, rather than doing
it as part of veclower?  If we do them separately, then it's harder for
veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so
that's a general problem between veclower and expand already, but it
seems like the new approach could help to move away from that by
doing the instruction selection directly in veclower.)

> +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal

> +   function based on type of selected expansion.  */

> +

> +static gimple *

> +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)

> +{

> +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;

> +  enum tree_code code;

> +  enum tree_code tcode;

> +  machine_mode cmp_op_mode;

> +  bool unsignedp;

> +  enum insn_code icode;

> +  imm_use_iterator imm_iter;

> +

> +  /* Only consider code == GIMPLE_ASSIGN.  */

> +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));

> +  if (!stmt)

> +    return NULL;

> +

> +  code = gimple_assign_rhs_code (stmt);

> +  if (code != VEC_COND_EXPR)

> +    return NULL;

> +

> +  tree op0 = gimple_assign_rhs1 (stmt);

> +  tree op1 = gimple_assign_rhs2 (stmt);

> +  tree op2 = gimple_assign_rhs3 (stmt);

> +  lhs = gimple_assign_lhs (stmt);

> +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));

> +

> +  gcc_assert (!COMPARISON_CLASS_P (op0));

> +  if (TREE_CODE (op0) == SSA_NAME)

> +    {

> +      unsigned int used_vec_cond_exprs = 0;

> +      gimple *use_stmt;

> +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)

> +	{

> +	  gassign *assign = dyn_cast<gassign *> (use_stmt);

> +	  if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR

> +	      && gimple_assign_rhs1 (assign) == op0)

> +	    used_vec_cond_exprs++;

> +	}


This looks like it's quadratic in the worst case.  Could we check
this in a different way?

> +

> +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));

> +      if (def_stmt)

> +	{

> +	  tcode = gimple_assign_rhs_code (def_stmt);

> +	  op0a = gimple_assign_rhs1 (def_stmt);

> +	  op0b = gimple_assign_rhs2 (def_stmt);

> +

> +	  tree op0a_type = TREE_TYPE (op0a);

> +	  if (used_vec_cond_exprs >= 2


It would be good if targets were able to provide only vcond_mask.
In that case I guess we should go this path if the later one would fail.

> +	      && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))

> +		  != CODE_FOR_nothing)

> +	      && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))

> +	    {

> +	      /* Keep the SSA name and use vcond_mask.  */

> +	      tcode = TREE_CODE (op0);

> +	    }

> +	}

> +      else

> +	tcode = TREE_CODE (op0);

> +    }

> +  else

> +    tcode = TREE_CODE (op0);


Might be easier to follow if tcode is TREE_CODE (op0) by default and
only gets changed when we want to fold in the comparison.

Thanks,
Richard
Iain Buclaw via Gcc-patches April 6, 2020, 12:30 p.m. | #2
On Mon, Apr 6, 2020 at 11:18 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> Martin Liška <mliska@suse.cz> writes:

> > Hello.

> >

> > This is second attempt to get rid of tcc_comparison GENERIC trees

> > to be used as the first argument of VEC_COND_EXPR.

> >

> > The patch attempts achieves that in the following steps:

> > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME

> > 2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR

> >     (done in GIMPLE verifier)

> > 3) I exposed new internal functions with:

> > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

> > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

> > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

> > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

> >

> > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass

> > 5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)

> > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when

> >     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements

> >

> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

> > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake

> > target and I don't see any reasonable change.

> >

> > Achieved benefits of the patch:

> > - removal of a GENERIC expression being used in GIMPLE statements

> > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)

> > - possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)

> >

> > Future plans:

> > - tcc_comparison removal just during gimplification

> > - removal of a code where these expressions are handled for VEC_COND_EXPR

> > - do the similar thing for COND_EXPR?

> >

> > The task was guided by Richi (Biener) and I bet he can help with both further questions

> > and reasoning.

>

> Thanks for doing this.  It definitely seems more friendly than the

> four-operand version to targets where separate comparisons are the norm.

>

> Just a couple of comments about the implementation:

>

> > diff --git a/gcc/passes.def b/gcc/passes.def

> > index 2bf2cb78fc5..d654e5ee9fe 100644

> > --- a/gcc/passes.def

> > +++ b/gcc/passes.def

> > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see

> >    NEXT_PASS (pass_cleanup_eh);

> >    NEXT_PASS (pass_lower_resx);

> >    NEXT_PASS (pass_nrv);

> > +  NEXT_PASS (pass_gimple_isel);

> >    NEXT_PASS (pass_cleanup_cfg_post_optimizing);

> >    NEXT_PASS (pass_warn_function_noreturn);

> >    NEXT_PASS (pass_gen_hsail);

>

> What was the reason for making this a separate pass, rather than doing

> it as part of veclower?  If we do them separately, then it's harder for

> veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so

> that's a general problem between veclower and expand already, but it

> seems like the new approach could help to move away from that by

> doing the instruction selection directly in veclower.)


As the name of the pass suggests it was supposed to be the starting point
of doing all the "complex" (multi-GIMPLE-stmt matching) RTL expansion tricks.

But most importantly veclower is too early to catch CSE opportunities from
loop opts on the conditions and if veclower lowers things then we also want
CSE to cleanup its mess.  I guess we also do not want veclower to be done
before vectorization since it should be easier to re-vectorize from unsupported
vector code than from what veclower makes out of it ... catch-22.

So I consider pass placement a secondary issue for now.

> > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal

> > +   function based on type of selected expansion.  */

> > +

> > +static gimple *

> > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)

> > +{

> > +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;

> > +  enum tree_code code;

> > +  enum tree_code tcode;

> > +  machine_mode cmp_op_mode;

> > +  bool unsignedp;

> > +  enum insn_code icode;

> > +  imm_use_iterator imm_iter;

> > +

> > +  /* Only consider code == GIMPLE_ASSIGN.  */

> > +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));

> > +  if (!stmt)

> > +    return NULL;

> > +

> > +  code = gimple_assign_rhs_code (stmt);

> > +  if (code != VEC_COND_EXPR)

> > +    return NULL;

> > +

> > +  tree op0 = gimple_assign_rhs1 (stmt);

> > +  tree op1 = gimple_assign_rhs2 (stmt);

> > +  tree op2 = gimple_assign_rhs3 (stmt);

> > +  lhs = gimple_assign_lhs (stmt);

> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));

> > +

> > +  gcc_assert (!COMPARISON_CLASS_P (op0));

> > +  if (TREE_CODE (op0) == SSA_NAME)

> > +    {

> > +      unsigned int used_vec_cond_exprs = 0;

> > +      gimple *use_stmt;

> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)

> > +     {

> > +       gassign *assign = dyn_cast<gassign *> (use_stmt);

> > +       if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR

> > +           && gimple_assign_rhs1 (assign) == op0)

> > +         used_vec_cond_exprs++;

> > +     }

>

> This looks like it's quadratic in the worst case.  Could we check

> this in a different way?

>

> > +

> > +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));

> > +      if (def_stmt)

> > +     {

> > +       tcode = gimple_assign_rhs_code (def_stmt);

> > +       op0a = gimple_assign_rhs1 (def_stmt);

> > +       op0b = gimple_assign_rhs2 (def_stmt);

> > +

> > +       tree op0a_type = TREE_TYPE (op0a);

> > +       if (used_vec_cond_exprs >= 2

>

> It would be good if targets were able to provide only vcond_mask.

> In that case I guess we should go this path if the later one would fail.

>

> > +           && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))

> > +               != CODE_FOR_nothing)

> > +           && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))

> > +         {

> > +           /* Keep the SSA name and use vcond_mask.  */

> > +           tcode = TREE_CODE (op0);

> > +         }

> > +     }

> > +      else

> > +     tcode = TREE_CODE (op0);

> > +    }

> > +  else

> > +    tcode = TREE_CODE (op0);

>

> Might be easier to follow if tcode is TREE_CODE (op0) by default and

> only gets changed when we want to fold in the comparison.

>

> Thanks,

> Richard
Iain Buclaw via Gcc-patches April 6, 2020, 12:33 p.m. | #3
On Mon, Apr 6, 2020 at 11:18 AM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> Martin Liška <mliska@suse.cz> writes:

> > Hello.

> >

> > This is second attempt to get rid of tcc_comparison GENERIC trees

> > to be used as the first argument of VEC_COND_EXPR.

> >

> > The patch attempts achieves that in the following steps:

> > 1) veclower pass expands all tcc_comparison expression into a SSA_NAME

> > 2) since that tcc_comparsion can't be used as the first argument of VEC_COND_EXPR

> >     (done in GIMPLE verifier)

> > 3) I exposed new internal functions with:

> > DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

> > DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

> > DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

> > DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

> >

> > 4) logic of expand_vec_cond_expr is moved into the new pass_gimple_isel pass

> > 5) the pass expands VEC_COND_EXPR into one of the internal functions defined in 3)

> > 6) moreover, I've added a new logic that prefers expand_vec_cmp_expr_p when

> >     a SSA_NAME is being used in multiple (2+) VEC_COND_EXPR statements

> >

> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

> > Moreover, I run SPEC2006 and SPEC2017 benchmarks on znver1, znver2 and skylake

> > target and I don't see any reasonable change.

> >

> > Achieved benefits of the patch:

> > - removal of a GENERIC expression being used in GIMPLE statements

> > - extraction into SSA_NAMEs can enable proper tree optimizer (FRE, DOM, PRE)

> > - possibility to expand smarter based on number of uses (expand_vec_cmp_expr_p)

> >

> > Future plans:

> > - tcc_comparison removal just during gimplification

> > - removal of a code where these expressions are handled for VEC_COND_EXPR

> > - do the similar thing for COND_EXPR?

> >

> > The task was guided by Richi (Biener) and I bet he can help with both further questions

> > and reasoning.

>

> Thanks for doing this.  It definitely seems more friendly than the

> four-operand version to targets where separate comparisons are the norm.

>

> Just a couple of comments about the implementation:

>

> > diff --git a/gcc/passes.def b/gcc/passes.def

> > index 2bf2cb78fc5..d654e5ee9fe 100644

> > --- a/gcc/passes.def

> > +++ b/gcc/passes.def

> > @@ -397,6 +397,7 @@ along with GCC; see the file COPYING3.  If not see

> >    NEXT_PASS (pass_cleanup_eh);

> >    NEXT_PASS (pass_lower_resx);

> >    NEXT_PASS (pass_nrv);

> > +  NEXT_PASS (pass_gimple_isel);

> >    NEXT_PASS (pass_cleanup_cfg_post_optimizing);

> >    NEXT_PASS (pass_warn_function_noreturn);

> >    NEXT_PASS (pass_gen_hsail);

>

> What was the reason for making this a separate pass, rather than doing

> it as part of veclower?  If we do them separately, then it's harder for

> veclower to know which VEC_COND_EXPRs it needs to open-code.  (OK, so

> that's a general problem between veclower and expand already, but it

> seems like the new approach could help to move away from that by

> doing the instruction selection directly in veclower.)

>

> > +/* Expand all VEC_COND_EXPR gimple assignments into calls to internal

> > +   function based on type of selected expansion.  */

> > +

> > +static gimple *

> > +gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)

> > +{

> > +  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;

> > +  enum tree_code code;

> > +  enum tree_code tcode;

> > +  machine_mode cmp_op_mode;

> > +  bool unsignedp;

> > +  enum insn_code icode;

> > +  imm_use_iterator imm_iter;

> > +

> > +  /* Only consider code == GIMPLE_ASSIGN.  */

> > +  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));

> > +  if (!stmt)

> > +    return NULL;

> > +

> > +  code = gimple_assign_rhs_code (stmt);

> > +  if (code != VEC_COND_EXPR)

> > +    return NULL;

> > +

> > +  tree op0 = gimple_assign_rhs1 (stmt);

> > +  tree op1 = gimple_assign_rhs2 (stmt);

> > +  tree op2 = gimple_assign_rhs3 (stmt);

> > +  lhs = gimple_assign_lhs (stmt);

> > +  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));

> > +

> > +  gcc_assert (!COMPARISON_CLASS_P (op0));

> > +  if (TREE_CODE (op0) == SSA_NAME)

> > +    {

> > +      unsigned int used_vec_cond_exprs = 0;

> > +      gimple *use_stmt;

> > +      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)

> > +     {

> > +       gassign *assign = dyn_cast<gassign *> (use_stmt);

> > +       if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR

> > +           && gimple_assign_rhs1 (assign) == op0)

> > +         used_vec_cond_exprs++;

> > +     }

>

> This looks like it's quadratic in the worst case.  Could we check

> this in a different way?


We could remember a SSA names cond-expr-uses and thus only compute it
once.

> > +

> > +      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));

> > +      if (def_stmt)

> > +     {

> > +       tcode = gimple_assign_rhs_code (def_stmt);

> > +       op0a = gimple_assign_rhs1 (def_stmt);

> > +       op0b = gimple_assign_rhs2 (def_stmt);

> > +

> > +       tree op0a_type = TREE_TYPE (op0a);

> > +       if (used_vec_cond_exprs >= 2

>

> It would be good if targets were able to provide only vcond_mask.

> In that case I guess we should go this path if the later one would fail.

>

> > +           && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))

> > +               != CODE_FOR_nothing)

> > +           && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))

> > +         {

> > +           /* Keep the SSA name and use vcond_mask.  */

> > +           tcode = TREE_CODE (op0);

> > +         }

> > +     }

> > +      else

> > +     tcode = TREE_CODE (op0);

> > +    }

> > +  else

> > +    tcode = TREE_CODE (op0);

>

> Might be easier to follow if tcode is TREE_CODE (op0) by default and

> only gets changed when we want to fold in the comparison.

>

> Thanks,

> Richard
Martin Liška May 21, 2020, 12:51 p.m. | #4
Hi.

Back to this I noticed that ppc64le target build is broken due to:

g++  -fno-PIE -c   -g   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include  -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o insn-emit.o -MT insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv4sfv4sf cannot FAIL
   357 |     FAIL;
       |           ^
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv2dfv2df cannot FAIL
   357 |     FAIL;
       |           ^
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv16qiv16qi cannot FAIL
   374 |     FAIL;
       |           ^
/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv8hiv8hi cannot FAIL
   374 |     FAIL;
       |           ^
...


which is caused by the 4 added optabs:

+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

looking at the generator:

Breakpoint 6, gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:516
516	      if (find_optab (&p, XSTR (expand, 0)))
(gdb) bt
#0  emit_c_code (code=0x7fa0f0 "{\n  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],\n\t\t\t\t    operands[3], operands[4], operands[5]))\n    DONE;\n  else\n    FAIL;\n}", can_fail_p=false, name=0x7fa190 "vcondv4sfv4sf")
     at /home/marxin/Programming/gcc/gcc/genemit.c:306
#1  0x00000000004039b5 in gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:522
#2  0x0000000000404912 in main (argc=4, argv=0x7fffffffe288) at /home/marxin/Programming/gcc/gcc/genemit.c:916

I get there due to:

B- │516               if (find_optab (&p, XSTR (expand, 0)))│
    │517                 {                                   │
    │518                   gcc_assert (p.op < NUM_OPTABS);   │
    │519                   if (nofail_optabs[p.op])          │
    │520                     can_fail_p = false;             │
    │521                 }                                   │


#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
   nofail_optabs[OPTAB##_optab] = true;
#include "internal-fn.def"

Any hint what's bad? Note that x86_64-linux-gnu is fine.
Do I miss a target hook?

Martin
Martin Liška May 21, 2020, 1:29 p.m. | #5
Adding Segher to CC, he can help us.

Martin

On 5/21/20 2:51 PM, Martin Liška wrote:
> Hi.

> 

> Back to this I noticed that ppc64le target build is broken due to:

> 

> g++  -fno-PIE -c   -g   -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE   -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. -I. -I/home/marxin/Programming/gcc/gcc -I/home/marxin/Programming/gcc/gcc/. -I/home/marxin/Programming/gcc/gcc/../include -I/home/marxin/Programming/gcc/gcc/../libcpp/include  -I/home/marxin/Programming/gcc/gcc/../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libdecnumber/dpd -I../libdecnumber -I/home/marxin/Programming/gcc/gcc/../libbacktrace   -o insn-emit.o -MT insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c

> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv4sfv4sf cannot FAIL

>    357 |     FAIL;

>        |           ^

> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: vcondv2dfv2df cannot FAIL

>    357 |     FAIL;

>        |           ^

> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv16qiv16qi cannot FAIL

>    374 |     FAIL;

>        |           ^

> /home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:374:11: error: vcondv8hiv8hi cannot FAIL

>    374 |     FAIL;

>        |           ^

> ...

> 

> 

> which is caused by the 4 added optabs:

> 

> +DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

> +DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

> +DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

> +DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

> 

> looking at the generator:

> 

> Breakpoint 6, gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:516

> 516          if (find_optab (&p, XSTR (expand, 0)))

> (gdb) bt

> #0  emit_c_code (code=0x7fa0f0 "{\n  if (rs6000_emit_vector_cond_expr (operands[0], operands[1], operands[2],\n\t\t\t\t    operands[3], operands[4], operands[5]))\n    DONE;\n  else\n    FAIL;\n}", can_fail_p=false, name=0x7fa190 "vcondv4sfv4sf")

>      at /home/marxin/Programming/gcc/gcc/genemit.c:306

> #1  0x00000000004039b5 in gen_expand (info=0x7fffffffe160) at /home/marxin/Programming/gcc/gcc/genemit.c:522

> #2  0x0000000000404912 in main (argc=4, argv=0x7fffffffe288) at /home/marxin/Programming/gcc/gcc/genemit.c:916

> 

> I get there due to:

> 

> B- │516               if (find_optab (&p, XSTR (expand, 0)))│

>     │517                 {                                   │

>     │518                   gcc_assert (p.op < NUM_OPTABS);   │

>     │519                   if (nofail_optabs[p.op])          │

>     │520                     can_fail_p = false;             │

>     │521                 }                                   │

> 

> 

> #define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \

>    nofail_optabs[OPTAB##_optab] = true;

> #include "internal-fn.def"

> 

> Any hint what's bad? Note that x86_64-linux-gnu is fine.

> Do I miss a target hook?

> 

> Martin
Segher Boessenkool May 21, 2020, 8:16 p.m. | #6
Hi!

On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:
> Adding Segher to CC, he can help us.


Oh dear.  Are you sure?

> On 5/21/20 2:51 PM, Martin Liška wrote:

> >Back to this I noticed that ppc64le target build is broken due to:


> >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c

> >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error: 

> >vcondv4sfv4sf cannot FAIL

> >   357 |     FAIL;


Is it new that vcond cannot FAIL?  Because we have done that for years.

Since this breaks bootstrap on a primary target, please revert the patch
until it is sorted.

> >which is caused by the 4 added optabs:

> >

> >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

> >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

> >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

> >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)


> >looking at the generator:


> >I get there due to:

> >

> >B- │516               if (find_optab (&p, XSTR (expand, 

> >0)))│

> >    │517                 

> > {                                   │

> >    │518                   gcc_assert (p.op < 

> > NUM_OPTABS);   │

> >    │519                   if 

> > (nofail_optabs[p.op])          │

> >    │520                     can_fail_p = 

> > false;             │

> >    │521                 

> > }                                   │

> >

> >

> >#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \

> >   nofail_optabs[OPTAB##_optab] = true;


So yes it is new.  Please fix :-(

> >Any hint what's bad? Note that x86_64-linux-gnu is fine.

> >Do I miss a target hook?


There is a new IFN that requires the existing optabs to never fail.  But
they *do* sometimes fail.  That is what I understand from this anyway,
please correct if needed :-)

We can make the rs6000 patterns never FAIL if that is a good idea (I am
not convinced however), but this should be documented, and all existing
targets need to be checked.

In general it is not pleasant at all to have patterns that cannot FAIL,
it makes writing a (new) port much harder, and there can be cases where
there is no sane code at all that can be generated for some cases, etc.


Segher
Iain Buclaw via Gcc-patches May 22, 2020, 11:14 a.m. | #7
On Thu, May 21, 2020 at 10:17 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>

> Hi!

>

> On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:

> > Adding Segher to CC, he can help us.

>

> Oh dear.  Are you sure?

>

> > On 5/21/20 2:51 PM, Martin Liška wrote:

> > >Back to this I noticed that ppc64le target build is broken due to:

>

> > >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c

> > >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error:

> > >vcondv4sfv4sf cannot FAIL

> > >   357 |     FAIL;

>

> Is it new that vcond cannot FAIL?  Because we have done that for years.

>

> Since this breaks bootstrap on a primary target, please revert the patch

> until it is sorted.

>

> > >which is caused by the 4 added optabs:

> > >

> > >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

> > >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

> > >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

> > >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

>

> > >looking at the generator:

>

> > >I get there due to:

> > >

> > >B- │516               if (find_optab (&p, XSTR (expand,

> > >0)))│

> > >    │517

> > > {                                   │

> > >    │518                   gcc_assert (p.op <

> > > NUM_OPTABS);   │

> > >    │519                   if

> > > (nofail_optabs[p.op])          │

> > >    │520                     can_fail_p =

> > > false;             │

> > >    │521

> > > }                                   │


OK, so this is an "artifact" of direct internal functions.  We do check that
expansion does not actually FAIL before emitting calls to those IFNs.

I guess this simply makes direct internal functions not a 100% match for
our use and the way out is to add regular internal functions mapping to
the optabs.  That is, I guess, for direct-internal functions it should be
enough to check direct_internal_function_supported_p which it is not
for the case of vcond*.

Richard, do you agree?

Thanks,
Richard.

> > >

> > >#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \

> > >   nofail_optabs[OPTAB##_optab] = true;

>

> So yes it is new.  Please fix :-(

>

> > >Any hint what's bad? Note that x86_64-linux-gnu is fine.

> > >Do I miss a target hook?

>

> There is a new IFN that requires the existing optabs to never fail.  But

> they *do* sometimes fail.  That is what I understand from this anyway,

> please correct if needed :-)

>

> We can make the rs6000 patterns never FAIL if that is a good idea (I am

> not convinced however), but this should be documented, and all existing

> targets need to be checked.

>

> In general it is not pleasant at all to have patterns that cannot FAIL,

> it makes writing a (new) port much harder, and there can be cases where

> there is no sane code at all that can be generated for some cases, etc.

>

>

> Segher
Richard Sandiford May 26, 2020, 10:15 a.m. | #8
Richard Biener <richard.guenther@gmail.com> writes:
> On Thu, May 21, 2020 at 10:17 PM Segher Boessenkool

> <segher@kernel.crashing.org> wrote:

>>

>> Hi!

>>

>> On Thu, May 21, 2020 at 03:29:49PM +0200, Martin Liška wrote:

>> > Adding Segher to CC, he can help us.

>>

>> Oh dear.  Are you sure?

>>

>> > On 5/21/20 2:51 PM, Martin Liška wrote:

>> > >Back to this I noticed that ppc64le target build is broken due to:

>>

>> > >insn-emit.o -MMD -MP -MF ./.deps/insn-emit.TPo insn-emit.c

>> > >/home/marxin/Programming/gcc/gcc/config/rs6000/vector.md:357:11: error:

>> > >vcondv4sfv4sf cannot FAIL

>> > >   357 |     FAIL;

>>

>> Is it new that vcond cannot FAIL?  Because we have done that for years.

>>

>> Since this breaks bootstrap on a primary target, please revert the patch

>> until it is sorted.

>>

>> > >which is caused by the 4 added optabs:

>> > >

>> > >+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)

>> > >+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)

>> > >+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)

>> > >+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)

>>

>> > >looking at the generator:

>>

>> > >I get there due to:

>> > >

>> > >B- │516               if (find_optab (&p, XSTR (expand,

>> > >0)))│

>> > >    │517

>> > > {                                   │

>> > >    │518                   gcc_assert (p.op <

>> > > NUM_OPTABS);   │

>> > >    │519                   if

>> > > (nofail_optabs[p.op])          │

>> > >    │520                     can_fail_p =

>> > > false;             │

>> > >    │521

>> > > }                                   │

>

> OK, so this is an "artifact" of direct internal functions.  We do check that

> expansion does not actually FAIL before emitting calls to those IFNs.

>

> I guess this simply makes direct internal functions not a 100% match for

> our use and the way out is to add regular internal functions mapping to

> the optabs.  That is, I guess, for direct-internal functions it should be

> enough to check direct_internal_function_supported_p which it is not

> for the case of vcond*.

>

> Richard, do you agree?


Sorry for the late reply, been off for a few days.

I guess that would be OK for VCOND(U) as an intermediate step,
but long term, I think we should try to make all VCOND* directly-mapped.
If we're doing instruction selection on gimple (a good thing IMO)
we need to know before expand whether an operation is supported.

So longer-term, I think we should replace VCOND(U) with individual ifns,
like for VCONDEQ.  We could reduce the number of optabs needed by
canonicalising greater-based tests to lesser-based tests.

Thanks,
Richard
Martin Liška May 27, 2020, 2:04 p.m. | #9
On 5/26/20 12:15 PM, Richard Sandiford wrote:
> So longer-term, I think we should replace VCOND(U) with individual ifns,

> like for VCONDEQ.  We could reduce the number of optabs needed by

> canonicalising greater-based tests to lesser-based tests.


Hello.

Thanks for the feedback. So would it be possible to go with something
like DEF_INTERNAL_OPTAB_CAN_FAIL (see the attachment)?

I'm sending the complete patch that survives bootstrap and regression
tests on x86_64-linux-gnu and ppc64le-linux-gnu.

Martin
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 84d07d388ee..23c89dbf4e9 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -857,6 +857,9 @@ main (int argc, const char **argv)
 
 #define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
   nofail_optabs[OPTAB##_optab] = true;
+
+#define DEF_INTERNAL_OPTAB_CAN_FAIL(OPTAB) \
+  nofail_optabs[OPTAB##_optab] = false;
 #include "internal-fn.def"
 
   /* Assign sequential codes to all entries in the machine description
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 0c6fc371190..373273de2c2 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 				   UNSIGNED_OPTAB, TYPE)
      DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
      DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
+     DEF_INTERNAL_OPTAB_CAN_FAIL (OPTAB)
 
    where NAME is the name of the function, FLAGS is a set of
    ECF_* flags and FNSPEC is a string describing functions fnspec.
@@ -86,7 +87,10 @@ along with GCC; see the file COPYING3.  If not see
 
    where STMT is the statement that performs the call.  These are generated
    automatically for optab functions and call out to a function or macro
-   called expand_<TYPE>_optab_fn.  */
+   called expand_<TYPE>_optab_fn.
+
+   DEF_INTERNAL_OPTAB_CAN_FAIL defines tables that are used for GIMPLE
+   instruction selection and do not map directly to instructions.  */
 
 #ifndef DEF_INTERNAL_FN
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC)
@@ -118,6 +122,10 @@ along with GCC; see the file COPYING3.  If not see
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_OPTAB_CAN_FAIL
+#define DEF_INTERNAL_OPTAB_CAN_FAIL(OPTAB)
+#endif
+
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load)
 DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
@@ -141,6 +149,11 @@ DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
 DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
 DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
 
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcond)
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcondu)
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcondeq)
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcond_mask)
+
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
 		       check_raw_ptrs, check_ptrs)
@@ -385,4 +398,5 @@ DEF_INTERNAL_FN (NOP, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 #undef DEF_INTERNAL_FLT_FLOATN_FN
 #undef DEF_INTERNAL_SIGNED_OPTAB_FN
 #undef DEF_INTERNAL_OPTAB_FN
+#undef DEF_INTERNAL_OPTAB_CAN_FAIL
 #undef DEF_INTERNAL_FN
From 981952917dea9aaef3be13375fbf452566d926b3 Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>

Date: Mon, 9 Mar 2020 13:23:03 +0100
Subject: [PATCH] Lower VEC_COND_EXPR into internal functions.

gcc/ChangeLog:

2020-03-30  Martin Liska  <mliska@suse.cz>

	* expr.c (expand_expr_real_2): Put gcc_unreachable, we should reach
	this path.
	(do_store_flag): Likewise here.
	* internal-fn.c (vec_cond_mask_direct): New.
	(vec_cond_direct): Likewise.
	(vec_condu_direct): Likewise.
	(vec_condeq_direct): Likewise.
	(expand_vect_cond_optab_fn): Move from optabs.c.
	(expand_vec_cond_optab_fn): New alias.
	(expand_vec_condu_optab_fn): Likewise.
	(expand_vec_condeq_optab_fn): Likewise.
	(expand_vect_cond_mask_optab_fn): Moved from optabs.c.
	(expand_vec_cond_mask_optab_fn): New alias.
	(direct_vec_cond_mask_optab_supported_p): New.
	(direct_vec_cond_optab_supported_p): Likewise.
	(direct_vec_condu_optab_supported_p): Likewise.
	(direct_vec_condeq_optab_supported_p): Likewise.
	* internal-fn.def (DEF_INTERNAL_OPTAB_CAN_FAIL):
	(VCOND): New new internal optab
	function.
	(VCONDU): Likewise.
	(VCONDEQ): Likewise.
	(VCOND_MASK): Likewise.
	* optabs.c (expand_vec_cond_mask_expr): Removed.
	(expand_vec_cond_expr): Likewise.
	* optabs.h (expand_vec_cond_expr): Likewise.
	(vector_compare_rtx): Likewise.
	* passes.def: Add pass_gimple_isel.
	* tree-cfg.c (verify_gimple_assign_ternary): Add new
	GIMPLE check.
	* tree-pass.h (make_pass_gimple_isel): New.
	* tree-ssa-forwprop.c (pass_forwprop::execute): Do not forward
	to already lowered VEC_COND_EXPR.
	* tree-vect-generic.c (expand_vector_divmod): Expand to SSA_NAME.
	(expand_vector_condition): Expand tcc_comparison of a VEC_COND_EXPR
	into a SSA_NAME.
	(gimple_expand_vec_cond_expr): New.
	(gimple_expand_vec_cond_exprs): New.
	(class pass_gimple_isel): New.
	(make_pass_gimple_isel): New.
	* genemit.c (DEF_INTERNAL_OPTAB_CAN_FAIL): Support optabs that
	can fail.
---
 gcc/expr.c              |  25 +----
 gcc/genemit.c           |   3 +
 gcc/internal-fn.c       |  89 +++++++++++++++
 gcc/internal-fn.def     |  21 +++-
 gcc/optabs.c            | 124 +--------------------
 gcc/optabs.h            |   7 +-
 gcc/passes.def          |   1 +
 gcc/tree-cfg.c          |   8 ++
 gcc/tree-pass.h         |   1 +
 gcc/tree-ssa-forwprop.c |   6 +
 gcc/tree-vect-generic.c | 237 +++++++++++++++++++++++++++++++++++++++-
 11 files changed, 367 insertions(+), 155 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index dfbeae71518..a757394f436 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9205,17 +9205,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       if (temp != 0)
 	return temp;
 
-      /* For vector MIN <x, y>, expand it a VEC_COND_EXPR <x <= y, x, y>
-	 and similarly for MAX <x, y>.  */
       if (VECTOR_TYPE_P (type))
-	{
-	  tree t0 = make_tree (type, op0);
-	  tree t1 = make_tree (type, op1);
-	  tree comparison = build2 (code == MIN_EXPR ? LE_EXPR : GE_EXPR,
-				    type, t0, t1);
-	  return expand_vec_cond_expr (type, comparison, t0, t1,
-				       original_target);
-	}
+	gcc_unreachable ();
 
       /* At this point, a MEM target is no longer useful; we will get better
 	 code without it.  */
@@ -9804,10 +9795,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
-    case VEC_COND_EXPR:
-      target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
-      return target;
-
     case VEC_DUPLICATE_EXPR:
       op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
       target = expand_vector_broadcast (mode, op0);
@@ -12138,8 +12125,7 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   STRIP_NOPS (arg1);
 
   /* For vector typed comparisons emit code to generate the desired
-     all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
-     expander for this.  */
+     all-ones or all-zeros mask.  */
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
@@ -12147,12 +12133,7 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
 	  && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type, ops->code))
 	return expand_vec_cmp_expr (ops->type, ifexp, target);
       else
-	{
-	  tree if_true = constant_boolean_node (true, ops->type);
-	  tree if_false = constant_boolean_node (false, ops->type);
-	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
-				       if_false, target);
-	}
+	gcc_unreachable ();
     }
 
   /* Optimize (x % C1) == C2 or (x % C1) != C2 if it is beneficial
diff --git a/gcc/genemit.c b/gcc/genemit.c
index 84d07d388ee..23c89dbf4e9 100644
--- a/gcc/genemit.c
+++ b/gcc/genemit.c
@@ -857,6 +857,9 @@ main (int argc, const char **argv)
 
 #define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
   nofail_optabs[OPTAB##_optab] = true;
+
+#define DEF_INTERNAL_OPTAB_CAN_FAIL(OPTAB) \
+  nofail_optabs[OPTAB##_optab] = false;
 #include "internal-fn.def"
 
   /* Assign sequential codes to all entries in the machine description
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 5e9aa60721e..644f234e087 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "explow.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -107,6 +108,10 @@ init_internal_fns ()
 #define mask_store_direct { 3, 2, false }
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
+#define vec_cond_mask_direct { 0, 0, false }
+#define vec_cond_direct { 0, 0, false }
+#define vec_condu_direct { 0, 0, false }
+#define vec_condeq_direct { 0, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define unary_direct { 0, 0, true }
 #define binary_direct { 0, 0, true }
@@ -2548,6 +2553,86 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 
+/* Expand VCOND, VCONDU and VCONDEQ optab internal functions.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+  insn_code icode;
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0a = gimple_call_arg (stmt, 0);
+  tree op0b = gimple_call_arg (stmt, 1);
+  tree op1 = gimple_call_arg (stmt, 2);
+  tree op2 = gimple_call_arg (stmt, 3);
+  enum tree_code tcode = (tree_code) int_cst_value (gimple_call_arg (stmt, 4));
+
+  tree vec_cond_type = TREE_TYPE (lhs);
+  tree op_mode = TREE_TYPE (op0a);
+  bool unsignedp = TYPE_UNSIGNED (op_mode);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode cmp_op_mode = TYPE_MODE (op_mode);
+
+  icode = convert_optab_handler (optab, mode, cmp_op_mode);
+  rtx comparison
+    = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 4);
+  rtx rtx_op1 = expand_normal (op1);
+  rtx rtx_op2 = expand_normal (op2);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_fixed_operand (&ops[3], comparison);
+  create_fixed_operand (&ops[4], XEXP (comparison, 0));
+  create_fixed_operand (&ops[5], XEXP (comparison, 1));
+  expand_insn (icode, 6, ops);
+}
+
+#define expand_vec_cond_optab_fn expand_vect_cond_optab_fn
+#define expand_vec_condu_optab_fn expand_vect_cond_optab_fn
+#define expand_vec_condeq_optab_fn expand_vect_cond_optab_fn
+
+/* Expand VCOND_MASK optab internal function.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[4];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
+  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
+  rtx mask, rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  mask = expand_normal (op0);
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  mask = force_reg (mask_mode, mask);
+  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_input_operand (&ops[3], mask, mask_mode);
+  expand_insn (icode, 4, ops);
+}
+
+#define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
+
 static void
 expand_ABNORMAL_DISPATCHER (internal_fn, gcall *)
 {
@@ -3131,6 +3216,10 @@ multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_cond_mask_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_cond_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_condu_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_condeq_optab_supported_p multi_vector_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 1d190d492ff..373273de2c2 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 				   UNSIGNED_OPTAB, TYPE)
      DEF_INTERNAL_FLT_FN (NAME, FLAGS, OPTAB, TYPE)
      DEF_INTERNAL_INT_FN (NAME, FLAGS, OPTAB, TYPE)
+     DEF_INTERNAL_OPTAB_CAN_FAIL (OPTAB)
 
    where NAME is the name of the function, FLAGS is a set of
    ECF_* flags and FNSPEC is a string describing functions fnspec.
@@ -86,7 +87,10 @@ along with GCC; see the file COPYING3.  If not see
 
    where STMT is the statement that performs the call.  These are generated
    automatically for optab functions and call out to a function or macro
-   called expand_<TYPE>_optab_fn.  */
+   called expand_<TYPE>_optab_fn.
+
+   DEF_INTERNAL_OPTAB_CAN_FAIL defines tables that are used for GIMPLE
+   instruction selection and do not map directly to instructions.  */
 
 #ifndef DEF_INTERNAL_FN
 #define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC)
@@ -118,6 +122,10 @@ along with GCC; see the file COPYING3.  If not see
   DEF_INTERNAL_OPTAB_FN (NAME, FLAGS, OPTAB, TYPE)
 #endif
 
+#ifndef DEF_INTERNAL_OPTAB_CAN_FAIL
+#define DEF_INTERNAL_OPTAB_CAN_FAIL(OPTAB)
+#endif
+
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD, ECF_PURE, maskload, mask_load)
 DEF_INTERNAL_OPTAB_FN (LOAD_LANES, ECF_CONST, vec_load_lanes, load_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_LOAD_LANES, ECF_PURE,
@@ -136,6 +144,16 @@ DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0,
 		       vec_mask_store_lanes, mask_store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
+
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcond)
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcondu)
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcondeq)
+DEF_INTERNAL_OPTAB_CAN_FAIL (vcond_mask)
+
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
 		       check_raw_ptrs, check_ptrs)
@@ -380,4 +398,5 @@ DEF_INTERNAL_FN (NOP, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 #undef DEF_INTERNAL_FLT_FLOATN_FN
 #undef DEF_INTERNAL_SIGNED_OPTAB_FN
 #undef DEF_INTERNAL_OPTAB_FN
+#undef DEF_INTERNAL_OPTAB_CAN_FAIL
 #undef DEF_INTERNAL_FN
diff --git a/gcc/optabs.c b/gcc/optabs.c
index d85ce47f762..5c19e4271e7 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5439,7 +5439,7 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
    first comparison operand for insn ICODE.  Do not generate the
    compare instruction itself.  */
 
-static rtx
+rtx
 vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
 		    tree t_op0, tree t_op1, bool unsignedp,
 		    enum insn_code icode, unsigned int opno)
@@ -5806,128 +5806,6 @@ expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
-/* Generate insns for a VEC_COND_EXPR with mask, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_mask_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-			   rtx target)
-{
-  class expand_operand ops[4];
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
-  enum insn_code icode = get_vcond_mask_icode (mode, mask_mode);
-  rtx mask, rtx_op1, rtx_op2;
-
-  if (icode == CODE_FOR_nothing)
-    return 0;
-
-  mask = expand_normal (op0);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  mask = force_reg (mask_mode, mask);
-  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_input_operand (&ops[3], mask, mask_mode);
-  expand_insn (icode, 4, ops);
-
-  return ops[0].value;
-}
-
-/* Generate insns for a VEC_COND_EXPR, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-		      rtx target)
-{
-  class expand_operand ops[6];
-  enum insn_code icode;
-  rtx comparison, rtx_op1, rtx_op2;
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode cmp_op_mode;
-  bool unsignedp;
-  tree op0a, op0b;
-  enum tree_code tcode;
-
-  if (COMPARISON_CLASS_P (op0))
-    {
-      op0a = TREE_OPERAND (op0, 0);
-      op0b = TREE_OPERAND (op0, 1);
-      tcode = TREE_CODE (op0);
-    }
-  else
-    {
-      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
-      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
-	  != CODE_FOR_nothing)
-	return expand_vec_cond_mask_expr (vec_cond_type, op0, op1,
-					  op2, target);
-      /* Fake op0 < 0.  */
-      else
-	{
-	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
-		      == MODE_VECTOR_INT);
-	  op0a = op0;
-	  op0b = build_zero_cst (TREE_TYPE (op0));
-	  tcode = LT_EXPR;
-	}
-    }
-  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
-
-
-  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
-	      && known_eq (GET_MODE_NUNITS (mode),
-			   GET_MODE_NUNITS (cmp_op_mode)));
-
-  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
-  if (icode == CODE_FOR_nothing)
-    {
-      if (tcode == LT_EXPR
-	  && op0a == op0
-	  && TREE_CODE (op0) == VECTOR_CST)
-	{
-	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
-	     into a constant when only get_vcond_eq_icode is supported.
-	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
-	  unsigned HOST_WIDE_INT nelts;
-	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
-	    {
-	      if (VECTOR_CST_STEPPED_P (op0))
-		return 0;
-	      nelts = vector_cst_encoded_nelts (op0);
-	    }
-	  for (unsigned int i = 0; i < nelts; ++i)
-	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
-	      return 0;
-	  tcode = NE_EXPR;
-	}
-      if (tcode == EQ_EXPR || tcode == NE_EXPR)
-	icode = get_vcond_eq_icode (mode, cmp_op_mode);
-      if (icode == CODE_FOR_nothing)
-	return 0;
-    }
-
-  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
-				   icode, 4);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
-  expand_insn (icode, 6, ops);
-  return ops[0].value;
-}
-
 /* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
    Use TARGET for the result if nonnull and convenient.  */
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 5bd19503a0a..7c2ec257cb0 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -321,9 +321,6 @@ extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
 /* Generate code for vector comparison.  */
 extern rtx expand_vec_cmp_expr (tree, tree, rtx);
 
-/* Generate code for VEC_COND_EXPR.  */
-extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
-
 /* Generate code for VEC_SERIES_EXPR.  */
 extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
 
@@ -364,5 +361,9 @@ extern void expand_jump_insn (enum insn_code icode, unsigned int nops,
 			      class expand_operand *ops);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
+extern rtx vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
+			       tree t_op0, tree t_op1, bool unsignedp,
+			       enum insn_code icode, unsigned int opno);
+
 
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/passes.def b/gcc/passes.def
index 92cbe587a8a..e9f59d756c9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -398,6 +398,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
+  NEXT_PASS (pass_gimple_isel);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index d06a479e570..16ff06fbf88 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4199,6 +4199,14 @@ verify_gimple_assign_ternary (gassign *stmt)
 	  debug_generic_expr (rhs1_type);
 	  return true;
 	}
+      else if (cfun->curr_properties & PROP_gimple_lvec
+	       && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_comparison)
+	{
+	  error ("the first argument of %<VEC_COND_EXPR%> cannot be "
+		 "a %<GENERIC%> tree comparison expression");
+	  debug_generic_expr (rhs1);
+	  return true;
+	}
       /* Fallthrough.  */
     case COND_EXPR:
       if (!is_gimple_val (rhs1)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 576b3f67434..4efece1b35b 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -626,6 +626,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gimple_isel (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 759baf56897..fce392e204c 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -3125,6 +3125,12 @@ pass_forwprop::execute (function *fun)
 		    if (code == COND_EXPR
 			|| code == VEC_COND_EXPR)
 		      {
+			/* Do not propagate into VEC_COND_EXPRs after they are
+			   vector lowering pass.  */
+			if (code == VEC_COND_EXPR
+			    && (fun->curr_properties & PROP_gimple_lvec))
+			  break;
+
 			/* In this case the entire COND_EXPR is in rhs1. */
 			if (forward_propagate_into_cond (&gsi))
 			  {
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index a7fe83da0e3..8f6d63f01c5 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -694,12 +694,14 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 	  if (addend == NULL_TREE
 	      && expand_vec_cond_expr_p (type, type, LT_EXPR))
 	    {
-	      tree zero, cst, cond, mask_type;
-	      gimple *stmt;
+	      tree zero, cst, mask_type, mask;
+	      gimple *stmt, *cond;
 
 	      mask_type = truth_type_for (type);
 	      zero = build_zero_cst (type);
-	      cond = build2 (LT_EXPR, mask_type, op0, zero);
+	      mask = make_ssa_name (mask_type);
+	      cond = gimple_build_assign (mask, LT_EXPR, op0, zero);
+	      gsi_insert_before (gsi, cond, GSI_SAME_STMT);
 	      tree_vector_builder vec (type, nunits, 1);
 	      for (i = 0; i < nunits; i++)
 		vec.quick_push (build_int_cst (TREE_TYPE (type),
@@ -707,8 +709,8 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 						<< shifts[i]) - 1));
 	      cst = vec.build ();
 	      addend = make_ssa_name (type);
-	      stmt = gimple_build_assign (addend, VEC_COND_EXPR, cond,
-					  cst, zero);
+	      stmt
+		= gimple_build_assign (addend, VEC_COND_EXPR, mask, cst, zero);
 	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 	    }
 	}
@@ -964,7 +966,17 @@ expand_vector_condition (gimple_stmt_iterator *gsi)
     }
 
   if (expand_vec_cond_expr_p (type, TREE_TYPE (a1), TREE_CODE (a)))
-    return;
+    {
+      if (a_is_comparison)
+	{
+	  a = gimplify_build2 (gsi, TREE_CODE (a), TREE_TYPE (a), a1, a2);
+	  gimple_assign_set_rhs1 (stmt, a);
+	  update_stmt (stmt);
+	  return;
+	}
+      gcc_assert (TREE_CODE (a) == SSA_NAME || TREE_CODE (a) == VECTOR_CST);
+      return;
+    }
 
   /* Handle vector boolean types with bitmasks.  If there is a comparison
      and we can expand the comparison into the vector boolean bitmask,
@@ -2241,6 +2253,176 @@ expand_vector_operations (void)
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
 
+/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
+   function based on type of selected expansion.  */
+
+static gimple *
+gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
+			     hash_map<tree, unsigned int> *vec_cond_ssa_name_uses)
+{
+  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
+  enum tree_code code;
+  enum tree_code tcode;
+  machine_mode cmp_op_mode;
+  bool unsignedp;
+  enum insn_code icode;
+  imm_use_iterator imm_iter;
+
+  /* Only consider code == GIMPLE_ASSIGN.  */
+  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
+  if (!stmt)
+    return NULL;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (code != VEC_COND_EXPR)
+    return NULL;
+
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+  tree op2 = gimple_assign_rhs3 (stmt);
+  lhs = gimple_assign_lhs (stmt);
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+
+  gcc_assert (!COMPARISON_CLASS_P (op0));
+  if (TREE_CODE (op0) == SSA_NAME)
+    {
+      unsigned int used_vec_cond_exprs = 0;
+      unsigned int *slot = vec_cond_ssa_name_uses->get (op0);
+      if (slot)
+	used_vec_cond_exprs = *slot;
+      else
+	{
+	  gimple *use_stmt;
+	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
+	    {
+	      gassign *assign = dyn_cast<gassign *> (use_stmt);
+	      if (assign != NULL
+		  && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
+		  && gimple_assign_rhs1 (assign) == op0)
+		used_vec_cond_exprs++;
+	    }
+	  vec_cond_ssa_name_uses->put (op0, used_vec_cond_exprs);
+	}
+
+      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
+      if (def_stmt)
+	{
+	  tcode = gimple_assign_rhs_code (def_stmt);
+	  op0a = gimple_assign_rhs1 (def_stmt);
+	  op0b = gimple_assign_rhs2 (def_stmt);
+
+	  tree op0a_type = TREE_TYPE (op0a);
+	  if (used_vec_cond_exprs >= 2
+	      && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
+		  != CODE_FOR_nothing)
+	      && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
+	    {
+	      /* Keep the SSA name and use vcond_mask.  */
+	      tcode = TREE_CODE (op0);
+	    }
+	}
+      else
+	tcode = TREE_CODE (op0);
+    }
+  else
+    tcode = TREE_CODE (op0);
+
+  if (TREE_CODE_CLASS (tcode) != tcc_comparison)
+    {
+      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
+      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
+	  != CODE_FOR_nothing)
+	return gimple_build_call_internal (IFN_VCOND_MASK, 3, op0, op1, op2);
+      /* Fake op0 < 0.  */
+      else
+	{
+	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
+		      == MODE_VECTOR_INT);
+	  op0a = op0;
+	  op0b = build_zero_cst (TREE_TYPE (op0));
+	  tcode = LT_EXPR;
+	}
+    }
+  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+
+
+  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
+	      && known_eq (GET_MODE_NUNITS (mode),
+			   GET_MODE_NUNITS (cmp_op_mode)));
+
+  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    {
+      if (tcode == LT_EXPR
+	  && op0a == op0
+	  && TREE_CODE (op0) == VECTOR_CST)
+	{
+	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
+	     into a constant when only get_vcond_eq_icode is supported.
+	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
+	  unsigned HOST_WIDE_INT nelts;
+	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
+	    {
+	      if (VECTOR_CST_STEPPED_P (op0))
+		gcc_unreachable ();
+	      nelts = vector_cst_encoded_nelts (op0);
+	    }
+	  for (unsigned int i = 0; i < nelts; ++i)
+	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
+	      gcc_unreachable ();
+	  tcode = NE_EXPR;
+	}
+      if (tcode == EQ_EXPR || tcode == NE_EXPR)
+	{
+	  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+	  return gimple_build_call_internal (IFN_VCONDEQ, 5, op0a, op0b, op1,
+					     op2, tcode_tree);
+	}
+    }
+
+  gcc_assert (icode != CODE_FOR_nothing);
+  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+  return gimple_build_call_internal (unsignedp ? IFN_VCONDU : IFN_VCOND,
+				     5, op0a, op0b, op1, op2, tcode_tree);
+}
+
+/* Iterate all gimple statements and try to expand
+   VEC_COND_EXPR assignments.  */
+
+static unsigned int
+gimple_expand_vec_cond_exprs (void)
+{
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+  bool cfg_changed = false;
+  hash_map<tree, unsigned int> vec_cond_ssa_name_uses;
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *g = gimple_expand_vec_cond_expr (&gsi,
+						   &vec_cond_ssa_name_uses);
+	  if (g != NULL)
+	    {
+	      tree lhs = gimple_assign_lhs (gsi_stmt (gsi));
+	      gimple_set_lhs (g, lhs);
+	      gsi_replace (&gsi, g, false);
+	    }
+	  /* ???  If we do not cleanup EH then we will ICE in
+	     verification.  But in reality we have created wrong-code
+	     as we did not properly transition EH info and edges to
+	     the piecewise computations.  */
+	  if (maybe_clean_eh_stmt (gsi_stmt (gsi))
+	      && gimple_purge_dead_eh_edges (bb))
+	    cfg_changed = true;
+	}
+    }
+
+  return cfg_changed ? TODO_cleanup_cfg : 0;
+}
+
 namespace {
 
 const pass_data pass_data_lower_vector =
@@ -2324,4 +2506,47 @@ make_pass_lower_vector_ssa (gcc::context *ctxt)
   return new pass_lower_vector_ssa (ctxt);
 }
 
+namespace {
+
+const pass_data pass_data_gimple_isel =
+{
+  GIMPLE_PASS, /* type */
+  "isel", /* name */
+  OPTGROUP_VEC, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_gimple_isel : public gimple_opt_pass
+{
+public:
+  pass_gimple_isel (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_gimple_isel, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return true;
+    }
+
+  virtual unsigned int execute (function *)
+    {
+      return gimple_expand_vec_cond_exprs ();
+    }
+
+}; // class pass_gimple_isel
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_gimple_isel (gcc::context *ctxt)
+{
+  return new pass_gimple_isel (ctxt);
+}
+
 #include "gt-tree-vect-generic.h"
-- 
2.26.2
Richard Sandiford May 27, 2020, 4:13 p.m. | #10
Martin Liška <mliska@suse.cz> writes:
> On 5/26/20 12:15 PM, Richard Sandiford wrote:

>> So longer-term, I think we should replace VCOND(U) with individual ifns,

>> like for VCONDEQ.  We could reduce the number of optabs needed by

>> canonicalising greater-based tests to lesser-based tests.

>

> Hello.

>

> Thanks for the feedback. So would it be possible to go with something

> like DEF_INTERNAL_OPTAB_CAN_FAIL (see the attachment)?


It doesn't look like this will solve the problem.  The reason that we
don't allow optabs for directly-mapped IFNs to FAIL is that:

  expand_insn (icode, 6, ops);

will (deliberately) ICE when the pattern FAILs.  Code that copes with
FAILing optabs instead needs to do:

  rtx_insn *watermark = get_last_insn (); <-- position whether it should go.
  ...
  if (maybe_expand_insn (icode, 6, ops))
    {
      ...Success...;
    }

  delete_insns_since (watermark);
  ...fallback code that implements the IFN without optab support...

At this point the IFN isn't really directly-mapped in the intended sense:
the optab is “just” a way of optimising the IFN.

So I think the effect of the patch will be to suppress the build failure,
but instead ICE for PowerPC when the FAIL condition is hit.  It might
be quite difficult to trigger though.  (That's why the static checking
is there. :-))

I think instead we should treat VCOND(U) as not directly-mapped,
as Richard suggested (IIRC).  The internal-fn.c code should then handle
the case in which we have an IFN_VCOND(U) call and the associated
optab fails.  Of course, this is only going to be exercised on targets
like powerpc* that having failing patterns, so it'll need testing there.

What I meant by the quote above is that I think this shows the flaw in
using IFN_VCOND(U) rather than splitting it up further.  Longer term,
we should have a separate IFN_VCOND* and optab for each necessary
condition.  There would then be no need (IMO) to allow the patterns
to FAIL, and we could use directly-mapped IFNs with no fallback.
There'd also be no need for the tree comparison operand to the IFN.

Thanks,
Richard
Iain Buclaw via Gcc-patches May 27, 2020, 4:32 p.m. | #11
On May 27, 2020 6:13:24 PM GMT+02:00, Richard Sandiford <richard.sandiford@arm.com> wrote:
>Martin Liška <mliska@suse.cz> writes:

>> On 5/26/20 12:15 PM, Richard Sandiford wrote:

>>> So longer-term, I think we should replace VCOND(U) with individual

>ifns,

>>> like for VCONDEQ.  We could reduce the number of optabs needed by

>>> canonicalising greater-based tests to lesser-based tests.

>>

>> Hello.

>>

>> Thanks for the feedback. So would it be possible to go with something

>> like DEF_INTERNAL_OPTAB_CAN_FAIL (see the attachment)?

>

>It doesn't look like this will solve the problem.  The reason that we

>don't allow optabs for directly-mapped IFNs to FAIL is that:

>

>  expand_insn (icode, 6, ops);

>

>will (deliberately) ICE when the pattern FAILs.  Code that copes with

>FAILing optabs instead needs to do:

>

>rtx_insn *watermark = get_last_insn (); <-- position whether it should

>go.

>  ...

>  if (maybe_expand_insn (icode, 6, ops))

>    {

>      ...Success...;

>    }

>

>  delete_insns_since (watermark);

>  ...fallback code that implements the IFN without optab support...

>

>At this point the IFN isn't really directly-mapped in the intended

>sense:

>the optab is “just” a way of optimising the IFN.

>

>So I think the effect of the patch will be to suppress the build

>failure,

>but instead ICE for PowerPC when the FAIL condition is hit.  It might

>be quite difficult to trigger though.  (That's why the static checking

>is there. :-))

>

>I think instead we should treat VCOND(U) as not directly-mapped,

>as Richard suggested (IIRC).  The internal-fn.c code should then handle

>the case in which we have an IFN_VCOND(U) call and the associated

>optab fails.  Of course, this is only going to be exercised on targets

>like powerpc* that having failing patterns, so it'll need testing

>there.

>

>What I meant by the quote above is that I think this shows the flaw in

>using IFN_VCOND(U) rather than splitting it up further.  Longer term,

>we should have a separate IFN_VCOND* and optab for each necessary

>condition.  There would then be no need (IMO) to allow the patterns

>to FAIL, and we could use directly-mapped IFNs with no fallback.

>There'd also be no need for the tree comparison operand to the IFN.


That might be indeed a good idea. 

Richard. 

>Thanks,

>Richard
Martin Liška May 28, 2020, 2:46 p.m. | #12
Hi.

There's a new patch that adds normal internal functions for the 4
VCOND* functions.

The patch that survives bootstrap and regression
tests on x86_64-linux-gnu and ppc64le-linux-gnu.

Thoughts?
Martin
From 9a8880a601c7820eb2d0c9104367ea454571681e Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>

Date: Mon, 9 Mar 2020 13:23:03 +0100
Subject: [PATCH] Lower VEC_COND_EXPR into internal functions.

gcc/ChangeLog:

2020-03-30  Martin Liska  <mliska@suse.cz>

	* expr.c (expand_expr_real_2): Put gcc_unreachable, we should reach
	this path.
	(do_store_flag): Likewise here.
	* internal-fn.c (expand_vect_cond_optab_fn): New.
	(expand_VCOND): Likewise.
	(expand_VCONDU): Likewise.
	(expand_VCONDEQ): Likewise.
	(expand_vect_cond_mask_optab_fn): Likewise.
	(expand_VCOND_MASK): Likewise.
	* internal-fn.def (VCOND): New.
	(VCONDU): Likewise.
	(VCONDEQ): Likewise.
	(VCOND_MASK): Likewise.
	* optabs.c (expand_vec_cond_mask_expr): Removed.
	(expand_vec_cond_expr): Likewise.
	* optabs.h (expand_vec_cond_expr): Likewise.
	(vector_compare_rtx): Likewise.
	* passes.def: Add pass_gimple_isel.
	* tree-cfg.c (verify_gimple_assign_ternary): Add new
	GIMPLE check.
	* tree-pass.h (make_pass_gimple_isel): New.
	* tree-ssa-forwprop.c (pass_forwprop::execute): Do not forward
	to already lowered VEC_COND_EXPR.
	* tree-vect-generic.c (expand_vector_divmod): Expand to SSA_NAME.
	(expand_vector_condition): Expand tcc_comparison of a VEC_COND_EXPR
	into a SSA_NAME.
	(gimple_expand_vec_cond_expr): New.
	(gimple_expand_vec_cond_exprs): New.
	(class pass_gimple_isel): New.
	(make_pass_gimple_isel): New.
---
 gcc/expr.c              |  25 +----
 gcc/internal-fn.c       |  98 +++++++++++++++++
 gcc/internal-fn.def     |   5 +
 gcc/optabs.c            | 124 +--------------------
 gcc/optabs.h            |   7 +-
 gcc/passes.def          |   1 +
 gcc/tree-cfg.c          |   8 ++
 gcc/tree-pass.h         |   1 +
 gcc/tree-ssa-forwprop.c |   6 +
 gcc/tree-vect-generic.c | 237 +++++++++++++++++++++++++++++++++++++++-
 10 files changed, 358 insertions(+), 154 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index dfbeae71518..a757394f436 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9205,17 +9205,8 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       if (temp != 0)
 	return temp;
 
-      /* For vector MIN <x, y>, expand it a VEC_COND_EXPR <x <= y, x, y>
-	 and similarly for MAX <x, y>.  */
       if (VECTOR_TYPE_P (type))
-	{
-	  tree t0 = make_tree (type, op0);
-	  tree t1 = make_tree (type, op1);
-	  tree comparison = build2 (code == MIN_EXPR ? LE_EXPR : GE_EXPR,
-				    type, t0, t1);
-	  return expand_vec_cond_expr (type, comparison, t0, t1,
-				       original_target);
-	}
+	gcc_unreachable ();
 
       /* At this point, a MEM target is no longer useful; we will get better
 	 code without it.  */
@@ -9804,10 +9795,6 @@ expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
-    case VEC_COND_EXPR:
-      target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
-      return target;
-
     case VEC_DUPLICATE_EXPR:
       op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
       target = expand_vector_broadcast (mode, op0);
@@ -12138,8 +12125,7 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
   STRIP_NOPS (arg1);
 
   /* For vector typed comparisons emit code to generate the desired
-     all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
-     expander for this.  */
+     all-ones or all-zeros mask.  */
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
@@ -12147,12 +12133,7 @@ do_store_flag (sepops ops, rtx target, machine_mode mode)
 	  && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type, ops->code))
 	return expand_vec_cmp_expr (ops->type, ifexp, target);
       else
-	{
-	  tree if_true = constant_boolean_node (true, ops->type);
-	  tree if_false = constant_boolean_node (false, ops->type);
-	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
-				       if_false, target);
-	}
+	gcc_unreachable ();
     }
 
   /* Optimize (x % C1) == C2 or (x % C1) != C2 if it is beneficial
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 5e9aa60721e..aa41b4f6870 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -49,6 +49,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "explow.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -2548,6 +2549,103 @@ expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 
+/* Expand VCOND, VCONDU and VCONDEQ internal functions.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_optab_fn (internal_fn ifn, gcall *stmt)
+{
+  class expand_operand ops[6];
+  insn_code icode;
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0a = gimple_call_arg (stmt, 0);
+  tree op0b = gimple_call_arg (stmt, 1);
+  tree op1 = gimple_call_arg (stmt, 2);
+  tree op2 = gimple_call_arg (stmt, 3);
+  enum tree_code tcode = (tree_code) int_cst_value (gimple_call_arg (stmt, 4));
+
+  tree vec_cond_type = TREE_TYPE (lhs);
+  tree op_mode = TREE_TYPE (op0a);
+  bool unsignedp = TYPE_UNSIGNED (op_mode);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode cmp_op_mode = TYPE_MODE (op_mode);
+
+  enum optab_tag optab;
+  switch (ifn)
+    {
+    case IFN_VCOND:
+      optab = vcond_optab;
+      break;
+    case IFN_VCONDU:
+      optab = vcondu_optab;
+      break;
+    case IFN_VCONDEQ:
+      optab = vcondeq_optab;
+      break;
+    default:
+      gcc_unreachable ();
+    }
+
+  icode = convert_optab_handler (optab, mode, cmp_op_mode);
+  rtx comparison
+    = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 4);
+  rtx rtx_op1 = expand_normal (op1);
+  rtx rtx_op2 = expand_normal (op2);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_fixed_operand (&ops[3], comparison);
+  create_fixed_operand (&ops[4], XEXP (comparison, 0));
+  create_fixed_operand (&ops[5], XEXP (comparison, 1));
+  expand_insn (icode, 6, ops);
+}
+
+#define expand_VCOND expand_vect_cond_optab_fn
+#define expand_VCONDU expand_vect_cond_optab_fn
+#define expand_VCONDEQ expand_vect_cond_optab_fn
+
+/* Expand VCOND_MASK internal function.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_mask_optab_fn (internal_fn, gcall *stmt)
+{
+  class expand_operand ops[4];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
+
+  enum insn_code icode = convert_optab_handler (vcond_mask_optab, mode, mask_mode);
+  rtx mask, rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  mask = expand_normal (op0);
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  mask = force_reg (mask_mode, mask);
+  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_input_operand (&ops[3], mask, mask_mode);
+  expand_insn (icode, 4, ops);
+}
+
+#define expand_VCOND_MASK expand_vect_cond_mask_optab_fn
+
 static void
 expand_ABNORMAL_DISPATCHER (internal_fn, gcall *)
 {
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 1d190d492ff..5602619fd2a 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -319,6 +319,11 @@ DEF_INTERNAL_FN (TSAN_FUNC_EXIT, ECF_NOVOPS | ECF_LEAF | ECF_NOTHROW, NULL)
 DEF_INTERNAL_FN (VA_ARG, ECF_NOTHROW | ECF_LEAF, NULL)
 DEF_INTERNAL_FN (VEC_CONVERT, ECF_CONST | ECF_LEAF | ECF_NOTHROW, NULL)
 
+DEF_INTERNAL_FN(VCOND, ECF_NOTHROW | ECF_LEAF, NULL)
+DEF_INTERNAL_FN(VCONDU, ECF_NOTHROW | ECF_LEAF, NULL)
+DEF_INTERNAL_FN(VCONDEQ, ECF_NOTHROW | ECF_LEAF, NULL)
+DEF_INTERNAL_FN(VCOND_MASK, ECF_NOTHROW | ECF_LEAF, NULL)
+
 /* An unduplicable, uncombinable function.  Generally used to preserve
    a CFG property in the face of jump threading, tail merging or
    other such optimizations.  The first argument distinguishes
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 7a4ec1ec01c..6621a1462b9 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5442,7 +5442,7 @@ get_rtx_code (enum tree_code tcode, bool unsignedp)
    first comparison operand for insn ICODE.  Do not generate the
    compare instruction itself.  */
 
-static rtx
+rtx
 vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
 		    tree t_op0, tree t_op1, bool unsignedp,
 		    enum insn_code icode, unsigned int opno)
@@ -5809,128 +5809,6 @@ expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
-/* Generate insns for a VEC_COND_EXPR with mask, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_mask_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-			   rtx target)
-{
-  class expand_operand ops[4];
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
-  enum insn_code icode = get_vcond_mask_icode (mode, mask_mode);
-  rtx mask, rtx_op1, rtx_op2;
-
-  if (icode == CODE_FOR_nothing)
-    return 0;
-
-  mask = expand_normal (op0);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  mask = force_reg (mask_mode, mask);
-  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_input_operand (&ops[3], mask, mask_mode);
-  expand_insn (icode, 4, ops);
-
-  return ops[0].value;
-}
-
-/* Generate insns for a VEC_COND_EXPR, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-		      rtx target)
-{
-  class expand_operand ops[6];
-  enum insn_code icode;
-  rtx comparison, rtx_op1, rtx_op2;
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode cmp_op_mode;
-  bool unsignedp;
-  tree op0a, op0b;
-  enum tree_code tcode;
-
-  if (COMPARISON_CLASS_P (op0))
-    {
-      op0a = TREE_OPERAND (op0, 0);
-      op0b = TREE_OPERAND (op0, 1);
-      tcode = TREE_CODE (op0);
-    }
-  else
-    {
-      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
-      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
-	  != CODE_FOR_nothing)
-	return expand_vec_cond_mask_expr (vec_cond_type, op0, op1,
-					  op2, target);
-      /* Fake op0 < 0.  */
-      else
-	{
-	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
-		      == MODE_VECTOR_INT);
-	  op0a = op0;
-	  op0b = build_zero_cst (TREE_TYPE (op0));
-	  tcode = LT_EXPR;
-	}
-    }
-  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
-
-
-  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
-	      && known_eq (GET_MODE_NUNITS (mode),
-			   GET_MODE_NUNITS (cmp_op_mode)));
-
-  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
-  if (icode == CODE_FOR_nothing)
-    {
-      if (tcode == LT_EXPR
-	  && op0a == op0
-	  && TREE_CODE (op0) == VECTOR_CST)
-	{
-	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
-	     into a constant when only get_vcond_eq_icode is supported.
-	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
-	  unsigned HOST_WIDE_INT nelts;
-	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
-	    {
-	      if (VECTOR_CST_STEPPED_P (op0))
-		return 0;
-	      nelts = vector_cst_encoded_nelts (op0);
-	    }
-	  for (unsigned int i = 0; i < nelts; ++i)
-	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
-	      return 0;
-	  tcode = NE_EXPR;
-	}
-      if (tcode == EQ_EXPR || tcode == NE_EXPR)
-	icode = get_vcond_eq_icode (mode, cmp_op_mode);
-      if (icode == CODE_FOR_nothing)
-	return 0;
-    }
-
-  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
-				   icode, 4);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
-  expand_insn (icode, 6, ops);
-  return ops[0].value;
-}
-
 /* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
    Use TARGET for the result if nonnull and convenient.  */
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 5bd19503a0a..7c2ec257cb0 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -321,9 +321,6 @@ extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
 /* Generate code for vector comparison.  */
 extern rtx expand_vec_cmp_expr (tree, tree, rtx);
 
-/* Generate code for VEC_COND_EXPR.  */
-extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
-
 /* Generate code for VEC_SERIES_EXPR.  */
 extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
 
@@ -364,5 +361,9 @@ extern void expand_jump_insn (enum insn_code icode, unsigned int nops,
 			      class expand_operand *ops);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
+extern rtx vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
+			       tree t_op0, tree t_op1, bool unsignedp,
+			       enum insn_code icode, unsigned int opno);
+
 
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/passes.def b/gcc/passes.def
index 92cbe587a8a..e9f59d756c9 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -398,6 +398,7 @@ along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
+  NEXT_PASS (pass_gimple_isel);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index d06a479e570..16ff06fbf88 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4199,6 +4199,14 @@ verify_gimple_assign_ternary (gassign *stmt)
 	  debug_generic_expr (rhs1_type);
 	  return true;
 	}
+      else if (cfun->curr_properties & PROP_gimple_lvec
+	       && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_comparison)
+	{
+	  error ("the first argument of %<VEC_COND_EXPR%> cannot be "
+		 "a %<GENERIC%> tree comparison expression");
+	  debug_generic_expr (rhs1);
+	  return true;
+	}
       /* Fallthrough.  */
     case COND_EXPR:
       if (!is_gimple_val (rhs1)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index 576b3f67434..4efece1b35b 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -626,6 +626,7 @@ extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gimple_isel (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 759baf56897..fce392e204c 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -3125,6 +3125,12 @@ pass_forwprop::execute (function *fun)
 		    if (code == COND_EXPR
 			|| code == VEC_COND_EXPR)
 		      {
+			/* Do not propagate into VEC_COND_EXPRs after they are
+			   vector lowering pass.  */
+			if (code == VEC_COND_EXPR
+			    && (fun->curr_properties & PROP_gimple_lvec))
+			  break;
+
 			/* In this case the entire COND_EXPR is in rhs1. */
 			if (forward_propagate_into_cond (&gsi))
 			  {
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index a7fe83da0e3..8f6d63f01c5 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -694,12 +694,14 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 	  if (addend == NULL_TREE
 	      && expand_vec_cond_expr_p (type, type, LT_EXPR))
 	    {
-	      tree zero, cst, cond, mask_type;
-	      gimple *stmt;
+	      tree zero, cst, mask_type, mask;
+	      gimple *stmt, *cond;
 
 	      mask_type = truth_type_for (type);
 	      zero = build_zero_cst (type);
-	      cond = build2 (LT_EXPR, mask_type, op0, zero);
+	      mask = make_ssa_name (mask_type);
+	      cond = gimple_build_assign (mask, LT_EXPR, op0, zero);
+	      gsi_insert_before (gsi, cond, GSI_SAME_STMT);
 	      tree_vector_builder vec (type, nunits, 1);
 	      for (i = 0; i < nunits; i++)
 		vec.quick_push (build_int_cst (TREE_TYPE (type),
@@ -707,8 +709,8 @@ expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 						<< shifts[i]) - 1));
 	      cst = vec.build ();
 	      addend = make_ssa_name (type);
-	      stmt = gimple_build_assign (addend, VEC_COND_EXPR, cond,
-					  cst, zero);
+	      stmt
+		= gimple_build_assign (addend, VEC_COND_EXPR, mask, cst, zero);
 	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 	    }
 	}
@@ -964,7 +966,17 @@ expand_vector_condition (gimple_stmt_iterator *gsi)
     }
 
   if (expand_vec_cond_expr_p (type, TREE_TYPE (a1), TREE_CODE (a)))
-    return;
+    {
+      if (a_is_comparison)
+	{
+	  a = gimplify_build2 (gsi, TREE_CODE (a), TREE_TYPE (a), a1, a2);
+	  gimple_assign_set_rhs1 (stmt, a);
+	  update_stmt (stmt);
+	  return;
+	}
+      gcc_assert (TREE_CODE (a) == SSA_NAME || TREE_CODE (a) == VECTOR_CST);
+      return;
+    }
 
   /* Handle vector boolean types with bitmasks.  If there is a comparison
      and we can expand the comparison into the vector boolean bitmask,
@@ -2241,6 +2253,176 @@ expand_vector_operations (void)
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
 
+/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
+   function based on type of selected expansion.  */
+
+static gimple *
+gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi,
+			     hash_map<tree, unsigned int> *vec_cond_ssa_name_uses)
+{
+  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
+  enum tree_code code;
+  enum tree_code tcode;
+  machine_mode cmp_op_mode;
+  bool unsignedp;
+  enum insn_code icode;
+  imm_use_iterator imm_iter;
+
+  /* Only consider code == GIMPLE_ASSIGN.  */
+  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
+  if (!stmt)
+    return NULL;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (code != VEC_COND_EXPR)
+    return NULL;
+
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+  tree op2 = gimple_assign_rhs3 (stmt);
+  lhs = gimple_assign_lhs (stmt);
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+
+  gcc_assert (!COMPARISON_CLASS_P (op0));
+  if (TREE_CODE (op0) == SSA_NAME)
+    {
+      unsigned int used_vec_cond_exprs = 0;
+      unsigned int *slot = vec_cond_ssa_name_uses->get (op0);
+      if (slot)
+	used_vec_cond_exprs = *slot;
+      else
+	{
+	  gimple *use_stmt;
+	  FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
+	    {
+	      gassign *assign = dyn_cast<gassign *> (use_stmt);
+	      if (assign != NULL
+		  && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
+		  && gimple_assign_rhs1 (assign) == op0)
+		used_vec_cond_exprs++;
+	    }
+	  vec_cond_ssa_name_uses->put (op0, used_vec_cond_exprs);
+	}
+
+      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
+      if (def_stmt)
+	{
+	  tcode = gimple_assign_rhs_code (def_stmt);
+	  op0a = gimple_assign_rhs1 (def_stmt);
+	  op0b = gimple_assign_rhs2 (def_stmt);
+
+	  tree op0a_type = TREE_TYPE (op0a);
+	  if (used_vec_cond_exprs >= 2
+	      && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
+		  != CODE_FOR_nothing)
+	      && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
+	    {
+	      /* Keep the SSA name and use vcond_mask.  */
+	      tcode = TREE_CODE (op0);
+	    }
+	}
+      else
+	tcode = TREE_CODE (op0);
+    }
+  else
+    tcode = TREE_CODE (op0);
+
+  if (TREE_CODE_CLASS (tcode) != tcc_comparison)
+    {
+      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
+      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
+	  != CODE_FOR_nothing)
+	return gimple_build_call_internal (IFN_VCOND_MASK, 3, op0, op1, op2);
+      /* Fake op0 < 0.  */
+      else
+	{
+	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
+		      == MODE_VECTOR_INT);
+	  op0a = op0;
+	  op0b = build_zero_cst (TREE_TYPE (op0));
+	  tcode = LT_EXPR;
+	}
+    }
+  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+
+
+  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
+	      && known_eq (GET_MODE_NUNITS (mode),
+			   GET_MODE_NUNITS (cmp_op_mode)));
+
+  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    {
+      if (tcode == LT_EXPR
+	  && op0a == op0
+	  && TREE_CODE (op0) == VECTOR_CST)
+	{
+	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
+	     into a constant when only get_vcond_eq_icode is supported.
+	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
+	  unsigned HOST_WIDE_INT nelts;
+	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
+	    {
+	      if (VECTOR_CST_STEPPED_P (op0))
+		gcc_unreachable ();
+	      nelts = vector_cst_encoded_nelts (op0);
+	    }
+	  for (unsigned int i = 0; i < nelts; ++i)
+	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
+	      gcc_unreachable ();
+	  tcode = NE_EXPR;
+	}
+      if (tcode == EQ_EXPR || tcode == NE_EXPR)
+	{
+	  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+	  return gimple_build_call_internal (IFN_VCONDEQ, 5, op0a, op0b, op1,
+					     op2, tcode_tree);
+	}
+    }
+
+  gcc_assert (icode != CODE_FOR_nothing);
+  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+  return gimple_build_call_internal (unsignedp ? IFN_VCONDU : IFN_VCOND,
+				     5, op0a, op0b, op1, op2, tcode_tree);
+}
+
+/* Iterate all gimple statements and try to expand
+   VEC_COND_EXPR assignments.  */
+
+static unsigned int
+gimple_expand_vec_cond_exprs (void)
+{
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+  bool cfg_changed = false;
+  hash_map<tree, unsigned int> vec_cond_ssa_name_uses;
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *g = gimple_expand_vec_cond_expr (&gsi,
+						   &vec_cond_ssa_name_uses);
+	  if (g != NULL)
+	    {
+	      tree lhs = gimple_assign_lhs (gsi_stmt (gsi));
+	      gimple_set_lhs (g, lhs);
+	      gsi_replace (&gsi, g, false);
+	    }
+	  /* ???  If we do not cleanup EH then we will ICE in
+	     verification.  But in reality we have created wrong-code
+	     as we did not properly transition EH info and edges to
+	     the piecewise computations.  */
+	  if (maybe_clean_eh_stmt (gsi_stmt (gsi))
+	      && gimple_purge_dead_eh_edges (bb))
+	    cfg_changed = true;
+	}
+    }
+
+  return cfg_changed ? TODO_cleanup_cfg : 0;
+}
+
 namespace {
 
 const pass_data pass_data_lower_vector =
@@ -2324,4 +2506,47 @@ make_pass_lower_vector_ssa (gcc::context *ctxt)
   return new pass_lower_vector_ssa (ctxt);
 }
 
+namespace {
+
+const pass_data pass_data_gimple_isel =
+{
+  GIMPLE_PASS, /* type */
+  "isel", /* name */
+  OPTGROUP_VEC, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_gimple_isel : public gimple_opt_pass
+{
+public:
+  pass_gimple_isel (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_gimple_isel, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return true;
+    }
+
+  virtual unsigned int execute (function *)
+    {
+      return gimple_expand_vec_cond_exprs ();
+    }
+
+}; // class pass_gimple_isel
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_gimple_isel (gcc::context *ctxt)
+{
+  return new pass_gimple_isel (ctxt);
+}
+
 #include "gt-tree-vect-generic.h"
-- 
2.26.2
Richard Sandiford May 28, 2020, 3:28 p.m. | #13
Martin Liška <mliska@suse.cz> writes:
> Hi.

>

> There's a new patch that adds normal internal functions for the 4

> VCOND* functions.

>

> The patch that survives bootstrap and regression

> tests on x86_64-linux-gnu and ppc64le-linux-gnu.


I think this has the same problem as the previous one.  What I meant
in yesterday's message is that:

  expand_insn (icode, 6, ops);

is simply not valid when icode is allowed to FAIL.  That's true in
any context, not just internal functions.  If icode does FAIL,
the expand_insn call will ICE:

  if (!maybe_expand_insn (icode, nops, ops))
    gcc_unreachable ();

When using optabs you either:

(a) declare that the md patterns aren't allowed to FAIL.  expand_insn
    is for this case.

(b) allow the md patterns to FAIL and provide a fallback when they do.
    maybe_expand_insn is for this case.

So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some
way of implementing the IFN_VCOND when the pattern FAILs.

Thanks,
Richard
Iain Buclaw via Gcc-patches May 29, 2020, 12:17 p.m. | #14
On Thu, May 28, 2020 at 5:28 PM Richard Sandiford
<richard.sandiford@arm.com> wrote:
>

> Martin Liška <mliska@suse.cz> writes:

> > Hi.

> >

> > There's a new patch that adds normal internal functions for the 4

> > VCOND* functions.

> >

> > The patch that survives bootstrap and regression

> > tests on x86_64-linux-gnu and ppc64le-linux-gnu.

>

> I think this has the same problem as the previous one.  What I meant

> in yesterday's message is that:

>

>   expand_insn (icode, 6, ops);

>

> is simply not valid when icode is allowed to FAIL.  That's true in

> any context, not just internal functions.  If icode does FAIL,

> the expand_insn call will ICE:

>

>   if (!maybe_expand_insn (icode, nops, ops))

>     gcc_unreachable ();

>

> When using optabs you either:

>

> (a) declare that the md patterns aren't allowed to FAIL.  expand_insn

>     is for this case.

>

> (b) allow the md patterns to FAIL and provide a fallback when they do.

>     maybe_expand_insn is for this case.

>

> So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some

> way of implementing the IFN_VCOND when the pattern FAILs.


But we should not have generated the pattern in that case - we actually verify
we can expand at the time we do this "instruction selection".  This is in-line
with other vectorizations where we also do not expect things to FAIL.

See also the expanders that are removed in the patch.

But adding a comment in the internal function expander to reflect this
is probably good, also pointing to the verification routines (the
preexisting expand_vec_cond_expr_p and expand_vec_cmp_expr_p
routines).  Because of this pre-verification I suggested the direct
internal function first, not being aware of the static cannot-FAIL logic.

Now it looks like that those verification also simply checks optab
availability only but then this is just a preexisting issue (and we can
possibly build a testcase that FAILs RTL expansion for power...).

So given that this means the latent bug in the powerpc backend
should be fixed and we should use a direct internal function instead?

Thanks,
Richard.

> Thanks,

> Richard
Iain Buclaw via Gcc-patches May 29, 2020, 12:43 p.m. | #15
On Fri, May 29, 2020 at 2:17 PM Richard Biener
<richard.guenther@gmail.com> wrote:
>

> On Thu, May 28, 2020 at 5:28 PM Richard Sandiford

> <richard.sandiford@arm.com> wrote:

> >

> > Martin Liška <mliska@suse.cz> writes:

> > > Hi.

> > >

> > > There's a new patch that adds normal internal functions for the 4

> > > VCOND* functions.

> > >

> > > The patch that survives bootstrap and regression

> > > tests on x86_64-linux-gnu and ppc64le-linux-gnu.

> >

> > I think this has the same problem as the previous one.  What I meant

> > in yesterday's message is that:

> >

> >   expand_insn (icode, 6, ops);

> >

> > is simply not valid when icode is allowed to FAIL.  That's true in

> > any context, not just internal functions.  If icode does FAIL,

> > the expand_insn call will ICE:

> >

> >   if (!maybe_expand_insn (icode, nops, ops))

> >     gcc_unreachable ();

> >

> > When using optabs you either:

> >

> > (a) declare that the md patterns aren't allowed to FAIL.  expand_insn

> >     is for this case.

> >

> > (b) allow the md patterns to FAIL and provide a fallback when they do.

> >     maybe_expand_insn is for this case.

> >

> > So if we keep IFN_VCOND, we need to use maybe_expand_insn and find some

> > way of implementing the IFN_VCOND when the pattern FAILs.

>

> But we should not have generated the pattern in that case - we actually verify

> we can expand at the time we do this "instruction selection".  This is in-line

> with other vectorizations where we also do not expect things to FAIL.

>

> See also the expanders that are removed in the patch.

>

> But adding a comment in the internal function expander to reflect this

> is probably good, also pointing to the verification routines (the

> preexisting expand_vec_cond_expr_p and expand_vec_cmp_expr_p

> routines).  Because of this pre-verification I suggested the direct

> internal function first, not being aware of the static cannot-FAIL logic.

>

> Now it looks like that those verification also simply checks optab

> availability only but then this is just a preexisting issue (and we can

> possibly build a testcase that FAILs RTL expansion for power...).

>

> So given that this means the latent bug in the powerpc backend

> should be fixed and we should use a direct internal function instead?


So I tried to understand the circumstances the rs6000 patterns FAIL
but FAILed ;)  It looks like some outs of rs6000_emit_vector_cond_expr
are unwarranted and the following should work:

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 8435bc15d72..5503215a00a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -14638,8 +14638,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
        rtx mask2;

        rev_code = reverse_condition_maybe_unordered (rcode);
-       if (rev_code == UNKNOWN)
-         return NULL_RTX;
+       gcc_assert (rev_code != UNKNOWN);

        nor_code = optab_handler (one_cmpl_optab, dmode);
        if (nor_code == CODE_FOR_nothing)
@@ -14737,8 +14736,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx
op_true, rtx op_false,
   rtx cond2;
   bool invert_move = false;

-  if (VECTOR_UNIT_NONE_P (dest_mode))
-    return 0;
+  gcc_assert (VECTOR_UNIT_NONE_P (dest_mode));

   gcc_assert (GET_MODE_SIZE (dest_mode) == GET_MODE_SIZE (mask_mode)
              && GET_MODE_NUNITS (dest_mode) == GET_MODE_NUNITS (mask_mode));
@@ -14756,8 +14754,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx
op_true, rtx op_false,
         e.g., A  = (B != C) ? D : E becomes A = (B == C) ? E : D.  */
       invert_move = true;
       rcode = reverse_condition_maybe_unordered (rcode);
-      if (rcode == UNKNOWN)
-       return 0;
+      gcc_assert (rcode != UNKNOWN);
       break;

     case GE:

which leaves the

  /* Get the vector mask for the given relational operations.  */
  mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, mask_mode);

  if (!mask)
    return 0;

fail but that function recurses heavily - from reading
rs6000_emit_vector_compare_inner
it looks like power can do a lot of compares but floating-point LT which
reverse_condition_maybe_unordered would turn into UNGE which is not
handled either.
But then rs6000_emit_vector_compare just tries GT for that anyway (not UNGE) so
it is actually be handled (but should not?).

So I bet the expansion of the patterns cannot fail at the moment.  Thus I'd
replace the FAIL with a gcc_unreachable () and see if we have test
coverage for those
FAILs.

Segher - do you actually know this code to guess why the patterns are defensive?

Thanks,
Richard.

> Thanks,

> Richard.

>

> > Thanks,

> > Richard
Segher Boessenkool May 29, 2020, 3:39 p.m. | #16
On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:
> Now it looks like that those verification also simply checks optab

> availability only but then this is just a preexisting issue (and we can

> possibly build a testcase that FAILs RTL expansion for power...).

> 

> So given that this means the latent bug in the powerpc backend

> should be fixed and we should use a direct internal function instead?


I don't see what you consider a bug in the backend here?  The expansion
FAILs, and it is explicitly allowed to do that.

Not allowed to FAIL are:
-- The "lanes" things;
-- vec_duplicate, vec_series;
-- maskload, maskstore;
-- fmin, fmax;
-- madd and friends;
-- sqrt, rsqrt;
-- fmod, remainder;
-- scalb, ldexp;
-- sin, cos, tan, asin, acos, atan;
-- exp, expm1, exp10, exp2, log, log1p, log10, log2, logb;
-- significand, pow, atan2, floor, btrunc, round, ceil, nearbyint, rint;
-- copysign, xorsign;
-- ffs, clrsb, clz, ctz, popcount, parity.

All vcond* patterns are allowed to fail.

Maybe ours don't *need* to, but that doesn't change a thing.

In general, it is a Very Good Thing if patterns are allowed to fail: if
they are not allowed to fail, they have to duplicate all the code that
the generic expander should have, into ever target that needs it.  It
also makes writing a (new) backend easier.


Segher
Segher Boessenkool May 29, 2020, 4:47 p.m. | #17
On Fri, May 29, 2020 at 02:43:12PM +0200, Richard Biener wrote:
> So I tried to understand the circumstances the rs6000 patterns FAIL

> but FAILed ;)  It looks like some outs of rs6000_emit_vector_cond_expr

> are unwarranted and the following should work:

> 

> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c

> index 8435bc15d72..5503215a00a 100644

> --- a/gcc/config/rs6000/rs6000.c

> +++ b/gcc/config/rs6000/rs6000.c

> @@ -14638,8 +14638,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,


(Different function, btw)

>         rtx mask2;

> 

>         rev_code = reverse_condition_maybe_unordered (rcode);

> -       if (rev_code == UNKNOWN)

> -         return NULL_RTX;

> +       gcc_assert (rev_code != UNKNOWN);


reverse_condition_maybe_unordered is documented as possibly returning
UNKNOWN.  The current implementation doesn't, sure.  But fix that first?

rs6000_emit_vector_compare can fail for several other reasons, too --
including when rs6000_emit_vector_compare_inner fails.

> @@ -14737,8 +14736,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx

> op_true, rtx op_false,

>    rtx cond2;

>    bool invert_move = false;

> 

> -  if (VECTOR_UNIT_NONE_P (dest_mode))

> -    return 0;

> +  gcc_assert (VECTOR_UNIT_NONE_P (dest_mode));


Why can this condition never be true?  (Missing a ! btw)

It needs a big comment if you want to make wide assumptions like that,
in any case.  Pretty much *all* (non-trivial) asserts need an explanation.

(And perhaps VECTOR_UNIT_ALTIVEC_OR_VSX_P is better).

>   /* Get the vector mask for the given relational operations.  */

>   mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, mask_mode);

> 

>   if (!mask)

>     return 0;

> 

> fail but that function recurses heavily - from reading

> rs6000_emit_vector_compare_inner

> it looks like power can do a lot of compares but floating-point LT which

> reverse_condition_maybe_unordered would turn into UNGE which is not

> handled either.

> But then rs6000_emit_vector_compare just tries GT for that anyway (not UNGE) so

> it is actually be handled (but should not?).

> 

> So I bet the expansion of the patterns cannot fail at the moment.  Thus I'd

> replace the FAIL with a gcc_unreachable () and see if we have test

> coverage for those

> FAILs.


I am not comfortable with that at all.

> Segher - do you actually know this code to guess why the patterns are defensive?


Yes.


If you want to change the documented semantics of widely used functions,
please propose that?


Segher
Richard Sandiford May 29, 2020, 4:57 p.m. | #18
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:

>> Now it looks like that those verification also simply checks optab

>> availability only but then this is just a preexisting issue (and we can

>> possibly build a testcase that FAILs RTL expansion for power...).

>> 

>> So given that this means the latent bug in the powerpc backend

>> should be fixed and we should use a direct internal function instead?

>

> I don't see what you consider a bug in the backend here?  The expansion

> FAILs, and it is explicitly allowed to do that.


Well, the docs say:

  …  For **certain** named patterns, it may invoke @code{FAIL} to tell the
  compiler to use an alternate way of performing that task.  …

(my emphasis).  Later on they say:

  @findex FAIL
  @item FAIL
  …

  Failure is currently supported only for binary (addition, multiplication,
  shifting, etc.) and bit-field (@code{extv}, @code{extzv}, and @code{insv})
  operations.

which explicitly says that vcond* isn't allowed to fail.

OK, so that list looks out of date.  But still. :-)

We now explicitly say that some patterns aren't allowed to FAIL,
which I guess gives the (implicit) impression that all the others can.
But that wasn't the intention.  The lines were just added for emphasis.
(AFAIK 7f9844caf1ebd513 was the first patch to do this.)

Richard
Richard Sandiford May 29, 2020, 5:05 p.m. | #19
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 02:43:12PM +0200, Richard Biener wrote:

>> So I tried to understand the circumstances the rs6000 patterns FAIL

>> but FAILed ;)  It looks like some outs of rs6000_emit_vector_cond_expr

>> are unwarranted and the following should work:

>> 

>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c

>> index 8435bc15d72..5503215a00a 100644

>> --- a/gcc/config/rs6000/rs6000.c

>> +++ b/gcc/config/rs6000/rs6000.c

>> @@ -14638,8 +14638,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,

>

> (Different function, btw)

>

>>         rtx mask2;

>> 

>>         rev_code = reverse_condition_maybe_unordered (rcode);

>> -       if (rev_code == UNKNOWN)

>> -         return NULL_RTX;

>> +       gcc_assert (rev_code != UNKNOWN);

>

> reverse_condition_maybe_unordered is documented as possibly returning

> UNKNOWN.  The current implementation doesn't, sure.  But fix that first?

>

> rs6000_emit_vector_compare can fail for several other reasons, too --

> including when rs6000_emit_vector_compare_inner fails.

>

>> @@ -14737,8 +14736,7 @@ rs6000_emit_vector_cond_expr (rtx dest, rtx

>> op_true, rtx op_false,

>>    rtx cond2;

>>    bool invert_move = false;

>> 

>> -  if (VECTOR_UNIT_NONE_P (dest_mode))

>> -    return 0;

>> +  gcc_assert (VECTOR_UNIT_NONE_P (dest_mode));

>

> Why can this condition never be true?  (Missing a ! btw)

>

> It needs a big comment if you want to make wide assumptions like that,

> in any case.  Pretty much *all* (non-trivial) asserts need an explanation.

>

> (And perhaps VECTOR_UNIT_ALTIVEC_OR_VSX_P is better).

>

>>   /* Get the vector mask for the given relational operations.  */

>>   mask = rs6000_emit_vector_compare (rcode, cc_op0, cc_op1, mask_mode);

>> 

>>   if (!mask)

>>     return 0;

>> 

>> fail but that function recurses heavily - from reading

>> rs6000_emit_vector_compare_inner

>> it looks like power can do a lot of compares but floating-point LT which

>> reverse_condition_maybe_unordered would turn into UNGE which is not

>> handled either.

>> But then rs6000_emit_vector_compare just tries GT for that anyway (not UNGE) so

>> it is actually be handled (but should not?).

>> 

>> So I bet the expansion of the patterns cannot fail at the moment.  Thus I'd

>> replace the FAIL with a gcc_unreachable () and see if we have test

>> coverage for those

>> FAILs.

>

> I am not comfortable with that at all.

>

>> Segher - do you actually know this code to guess why the patterns are defensive?

>

> Yes.


In that case, can you give a specific example in which the patterns do
actually fail?

I think Richard's point is that even the current compiler will ICE if
the vcond* patterns fail.  All Martin's patch did was expose that via
the extra static checking we get for directly-mapped internal fns.
If you want us to fix that by providing a fallback, we need to know what
the fallback should do.  E.g. the obvious thing would be to emit the
embedded comparison separately and then emit bitwise operations to
implement the select.  But in the powerpc case, it's actually the
comparison that's the potential problem, so that expansion would just
kick the can further down the road.

So which vector comparisons doesn't powerpc support, and what should the
fallback vcond* expansion for them be?

Richard
Segher Boessenkool May 29, 2020, 5:09 p.m. | #20
On Fri, May 29, 2020 at 05:57:13PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:

> > On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:

> >> Now it looks like that those verification also simply checks optab

> >> availability only but then this is just a preexisting issue (and we can

> >> possibly build a testcase that FAILs RTL expansion for power...).

> >> 

> >> So given that this means the latent bug in the powerpc backend

> >> should be fixed and we should use a direct internal function instead?

> >

> > I don't see what you consider a bug in the backend here?  The expansion

> > FAILs, and it is explicitly allowed to do that.

> 

> Well, the docs say:

> 

>   …  For **certain** named patterns, it may invoke @code{FAIL} to tell the

>   compiler to use an alternate way of performing that task.  …

> 

> (my emphasis).  Later on they say:

> 

>   @findex FAIL

>   @item FAIL

>   …

> 

>   Failure is currently supported only for binary (addition, multiplication,

>   shifting, etc.) and bit-field (@code{extv}, @code{extzv}, and @code{insv})

>   operations.

> 

> which explicitly says that vcond* isn't allowed to fail.

> 

> OK, so that list looks out of date.  But still. :-)

> 

> We now explicitly say that some patterns aren't allowed to FAIL,

> which I guess gives the (implicit) impression that all the others can.

> But that wasn't the intention.  The lines were just added for emphasis.

> (AFAIK 7f9844caf1ebd513 was the first patch to do this.)


Most patterns *do* FAIL on some target.  We cannot rewind time.


Segher
Richard Sandiford May 29, 2020, 5:26 p.m. | #21
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 05:57:13PM +0100, Richard Sandiford wrote:

>> Segher Boessenkool <segher@kernel.crashing.org> writes:

>> > On Fri, May 29, 2020 at 02:17:00PM +0200, Richard Biener wrote:

>> >> Now it looks like that those verification also simply checks optab

>> >> availability only but then this is just a preexisting issue (and we can

>> >> possibly build a testcase that FAILs RTL expansion for power...).

>> >> 

>> >> So given that this means the latent bug in the powerpc backend

>> >> should be fixed and we should use a direct internal function instead?

>> >

>> > I don't see what you consider a bug in the backend here?  The expansion

>> > FAILs, and it is explicitly allowed to do that.

>> 

>> Well, the docs say:

>> 

>>   …  For **certain** named patterns, it may invoke @code{FAIL} to tell the

>>   compiler to use an alternate way of performing that task.  …

>> 

>> (my emphasis).  Later on they say:

>> 

>>   @findex FAIL

>>   @item FAIL

>>   …

>> 

>>   Failure is currently supported only for binary (addition, multiplication,

>>   shifting, etc.) and bit-field (@code{extv}, @code{extzv}, and @code{insv})

>>   operations.

>> 

>> which explicitly says that vcond* isn't allowed to fail.

>> 

>> OK, so that list looks out of date.  But still. :-)

>> 

>> We now explicitly say that some patterns aren't allowed to FAIL,

>> which I guess gives the (implicit) impression that all the others can.

>> But that wasn't the intention.  The lines were just added for emphasis.

>> (AFAIK 7f9844caf1ebd513 was the first patch to do this.)

>

> Most patterns *do* FAIL on some target.  We cannot rewind time.


Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.
In fact it's the opposite.

If we ignore the docs and look at what the status quo actually is --
which I agree seems safest for GCC :-) -- then patterns are allowed to
FAIL if target-independent code provides an expand-time fallback for
the FAILing case.  But that isn't true for vcond either.
expand_vec_cond_expr does:

  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
  if (icode == CODE_FOR_nothing)
    ...

  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
				   icode, 4);
  rtx_op1 = expand_normal (op1);
  rtx_op2 = expand_normal (op2);

  create_output_operand (&ops[0], target, mode);
  create_input_operand (&ops[1], rtx_op1, mode);
  create_input_operand (&ops[2], rtx_op2, mode);
  create_fixed_operand (&ops[3], comparison);
  create_fixed_operand (&ops[4], XEXP (comparison, 0));
  create_fixed_operand (&ops[5], XEXP (comparison, 1));
  expand_insn (icode, 6, ops);
  return ops[0].value;

which ICEs if the expander FAILs.

So whether you go from the docs or from what's actually implemented,
vcond* isn't currently allowed to FAIL.  All Richard's gcc_unreachable
suggestion would do is change where the ICE happens.

Richard
Segher Boessenkool May 29, 2020, 5:30 p.m. | #22
On Fri, May 29, 2020 at 06:05:14PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:

> > On Fri, May 29, 2020 at 02:43:12PM +0200, Richard Biener wrote:

> >> Segher - do you actually know this code to guess why the patterns are defensive?

> >

> > Yes.

> 

> In that case, can you give a specific example in which the patterns do

> actually fail?


That is a very different question.  (And this is shifting the burden of
proof again.)

> I think Richard's point is that even the current compiler will ICE if

> the vcond* patterns fail.  All Martin's patch did was expose that via

> the extra static checking we get for directly-mapped internal fns.


How will they ICE?

> If you want us to fix that by providing a fallback, we need to know what

> the fallback should do.


Just whatever vcond* is documented to do, of course ;-)

> E.g. the obvious thing would be to emit the

> embedded comparison separately and then emit bitwise operations to

> implement the select.  But in the powerpc case, it's actually the

> comparison that's the potential problem, so that expansion would just

> kick the can further down the road.

> 

> So which vector comparisons doesn't powerpc support, and what should the

> fallback vcond* expansion for them be?


It depends on which set of vector registers is in use, and on the ISA
version as well, what the hardware can do.  What the backend can do --
well, it is allowed to FAIL these patterns, and it sometimes does.
That's the whole point isn't it?

vec_cmp* won't FAIL.  I don't know if there is a portable variant of
this?


Segher
Segher Boessenkool May 29, 2020, 5:37 p.m. | #23
On Fri, May 29, 2020 at 06:26:55PM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:

> > Most patterns *do* FAIL on some target.  We cannot rewind time.

> 

> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.

> In fact it's the opposite.


It has FAILed on rs6000 since 2004.

> If we ignore the docs and look at what the status quo actually is --

> which I agree seems safest for GCC :-) -- then patterns are allowed to

> FAIL if target-independent code provides an expand-time fallback for

> the FAILing case.  But that isn't true for vcond either.


That is a bug in the callers then :-)

> expand_vec_cond_expr does:

> 

>   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);

>   if (icode == CODE_FOR_nothing)

>     ...

> 

>   comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,

> 				   icode, 4);

>   rtx_op1 = expand_normal (op1);

>   rtx_op2 = expand_normal (op2);

> 

>   create_output_operand (&ops[0], target, mode);

>   create_input_operand (&ops[1], rtx_op1, mode);

>   create_input_operand (&ops[2], rtx_op2, mode);

>   create_fixed_operand (&ops[3], comparison);

>   create_fixed_operand (&ops[4], XEXP (comparison, 0));

>   create_fixed_operand (&ops[5], XEXP (comparison, 1));

>   expand_insn (icode, 6, ops);

>   return ops[0].value;

> 

> which ICEs if the expander FAILs.

> 

> So whether you go from the docs or from what's actually implemented,

> vcond* isn't currently allowed to FAIL.  All Richard's gcc_unreachable

> suggestion would do is change where the ICE happens.



>   icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);

>   if (icode == CODE_FOR_nothing)

>     ...


Of course it is allowed to FAIL, based on this code.  That is: the RTL
pattern is allowed to FAIL.  Whatever optabs do, I never understood :-)

Is this vec_cmp that is used by the fallback?  That will never FAIL
for us (if it is enabled at all, natch, same as for any other target).


Segher
Richard Sandiford May 30, 2020, 7:15 a.m. | #24
Segher Boessenkool <segher@kernel.crashing.org> writes:
> On Fri, May 29, 2020 at 06:26:55PM +0100, Richard Sandiford wrote:

>> Segher Boessenkool <segher@kernel.crashing.org> writes:

>> > Most patterns *do* FAIL on some target.  We cannot rewind time.

>> 

>> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.

>> In fact it's the opposite.

>

> It has FAILed on rs6000 since 2004.


But that just means that the powerpc bug has been there since 2004,
assuming these FAILs can actually trigger in practice.  At that time,
the corresponding expand code was:

/* Generate insns for VEC_COND_EXPR.  */

rtx
expand_vec_cond_expr (tree vec_cond_expr, rtx target)
{
  enum insn_code icode;
  rtx comparison, rtx_op1, rtx_op2, cc_op0, cc_op1;
  enum machine_mode mode = TYPE_MODE (TREE_TYPE (vec_cond_expr));
  bool unsignedp = TYPE_UNSIGNED (TREE_TYPE (vec_cond_expr));

  icode = get_vcond_icode (vec_cond_expr, mode);
  if (icode == CODE_FOR_nothing)
    return 0;

  if (!target)
    target = gen_reg_rtx (mode);

  /* Get comparison rtx.  First expand both cond expr operands.  */
  comparison = vector_compare_rtx (TREE_OPERAND (vec_cond_expr, 0), 
                                   unsignedp, icode);
  cc_op0 = XEXP (comparison, 0);
  cc_op1 = XEXP (comparison, 1);
  /* Expand both operands and force them in reg, if required.  */
  rtx_op1 = expand_expr (TREE_OPERAND (vec_cond_expr, 1),
                         NULL_RTX, VOIDmode, 1);
  if (!(*insn_data[icode].operand[1].predicate) (rtx_op1, mode)
      && mode != VOIDmode)
    rtx_op1 = force_reg (mode, rtx_op1);

  rtx_op2 = expand_expr (TREE_OPERAND (vec_cond_expr, 2),
                         NULL_RTX, VOIDmode, 1);
  if (!(*insn_data[icode].operand[2].predicate) (rtx_op2, mode)
      && mode != VOIDmode)
    rtx_op2 = force_reg (mode, rtx_op2);

  /* Emit instruction! */
  emit_insn (GEN_FCN (icode) (target, rtx_op1, rtx_op2, 
                              comparison, cc_op0,  cc_op1));

  return target;
}

i.e. no fallbacks, and no checking whether the expansion even
succeeded.  Since FAIL just causes the generator to return null,
and since emit_insn is a no-op for null insns, the effect for
FAILs was to emit no instructions and return an uninitialised
target register.

The silent use of an uninitialised register was changed in 2011
to an ICE, via the introduction of expand_insn.

>> If we ignore the docs and look at what the status quo actually is --

>> which I agree seems safest for GCC :-) -- then patterns are allowed to

>> FAIL if target-independent code provides an expand-time fallback for

>> the FAILing case.  But that isn't true for vcond either.

>

> That is a bug in the callers then :-)


It was a bug in the powerpc patch you cited that added the FAILs
without changing target-independent code to cope with them.

The fact that we've had no code to handle the FAILs for 15+ years
without apparent problems makes it even more likely that the FAILs
never happen in practice.

If you think the FAILs do trigger in practice, please provide an example.

Richard
Segher Boessenkool May 30, 2020, 1:08 p.m. | #25
Hi!

On Sat, May 30, 2020 at 08:15:55AM +0100, Richard Sandiford wrote:
> Segher Boessenkool <segher@kernel.crashing.org> writes:

> >> Sure.  But the point is that FAILing isn't “explicitly allowed” for vcond*.

> >> In fact it's the opposite.


I disagree btw, and no one else has noticed for 16 years either.

In general, almost all patterns can FAIL, and those that can not are
simply because no one wrote fallback code.  Which means that all
targets that need a fallback need to implement the same thing for
themselves, which is just a waste and causes extra errors.

So, "cannot FAIL" should be a temporary thing, and should change to
"can FAIL" as soon as someone implements that, and never be changed
back -- and it should be how almost everything is in the first place
(and it still is, thankfully).

> > It has FAILed on rs6000 since 2004.

> 

> But that just means that the powerpc bug has been there since 2004,

> assuming these FAILs can actually trigger in practice.  At that time,

> the corresponding expand code was:


I, and I think most other people, thought it was allowed to FAIL (and
I still do).

> rtx

> expand_vec_cond_expr (tree vec_cond_expr, rtx target)


[ snip ]

So this was buggy.

> i.e. no fallbacks, and no checking whether the expansion even

> succeeded.  Since FAIL just causes the generator to return null,

> and since emit_insn is a no-op for null insns, the effect for

> FAILs was to emit no instructions and return an uninitialised

> target register.

> 

> The silent use of an uninitialised register was changed in 2011

> to an ICE, via the introduction of expand_insn.


Yeah, I ran into some of that in 2015, at least then not all of that
was fixed.  That was some very basic insn I think, that really should
never fail, a simple branch or something...  Was surprising though, a
good reminder to always check return values :-)

> The fact that we've had no code to handle the FAILs for 15+ years

> without apparent problems makes it even more likely that the FAILs

> never happen in practice.


AltiVec can do a lot less than VSX (and VSX on p7 can do less than on
p8, and that can do less than p9, etc.), so I am pretty certain it
could fail for some cases.  Only up to not so very long ago these
patterns were mainly (or only?) used via builtins, and the code for
those handles all those cases already.

> If you think the FAILs do trigger in practice, please provide an example.


As I said before, that is completely beside the point.

vcond is allowed to FAIL.  No pattern that can FAIL should ever be
changed to not allow that anymore.  This would make no sense at all.


Segher

Patch

From 4a6f4aa3cdef7a032a4ad442e6cd5ec2e706144d Mon Sep 17 00:00:00 2001
From: Martin Liska <mliska@suse.cz>
Date: Mon, 9 Mar 2020 13:23:03 +0100
Subject: [PATCH] Lower VEC_COND_EXPR into internal functions.

gcc/ChangeLog:

2020-03-30  Martin Liska  <mliska@suse.cz>

	* expr.c (expand_expr_real_2): Put gcc_unreachable, we should reach
	this path.
	(do_store_flag): Likewise here.
	* internal-fn.c (vec_cond_mask_direct): New.
	(vec_cond_direct): Likewise.
	(vec_condu_direct): Likewise.
	(vec_condeq_direct): Likewise.
	(expand_vect_cond_optab_fn): Move from optabs.c.
	(expand_vec_cond_optab_fn): New alias.
	(expand_vec_condu_optab_fn): Likewise.
	(expand_vec_condeq_optab_fn): Likewise.
	(expand_vect_cond_mask_optab_fn): Moved from optabs.c.
	(expand_vec_cond_mask_optab_fn): New alias.
	(direct_vec_cond_mask_optab_supported_p): New.
	(direct_vec_cond_optab_supported_p): Likewise.
	(direct_vec_condu_optab_supported_p): Likewise.
	(direct_vec_condeq_optab_supported_p): Likewise.
	* internal-fn.def (VCOND): New new internal optab
	function.
	(VCONDU): Likewise.
	(VCONDEQ): Likewise.
	(VCOND_MASK): Likewise.
	* optabs.c (expand_vec_cond_mask_expr): Removed.
	(expand_vec_cond_expr): Likewise.
	* optabs.h (expand_vec_cond_expr): Likewise.
	(vector_compare_rtx): Likewise.
	* passes.def: Add pass_gimple_isel.
	* tree-cfg.c (verify_gimple_assign_ternary): Add new
	GIMPLE check.
	* tree-pass.h (make_pass_gimple_isel): New.
	* tree-ssa-forwprop.c (pass_forwprop::execute): Do not forward
	to already lowered VEC_COND_EXPR.
	* tree-vect-generic.c (expand_vector_divmod): Expand to SSA_NAME.
	(expand_vector_condition): Expand tcc_comparison of a VEC_COND_EXPR
	into a SSA_NAME.
	(gimple_expand_vec_cond_expr): New.
	(gimple_expand_vec_cond_exprs): New.
	(class pass_gimple_isel): New.
	(make_pass_gimple_isel): New.
---
 gcc/expr.c              |  25 +----
 gcc/internal-fn.c       |  89 ++++++++++++++++
 gcc/internal-fn.def     |   5 +
 gcc/optabs.c            | 124 +---------------------
 gcc/optabs.h            |   7 +-
 gcc/passes.def          |   1 +
 gcc/tree-cfg.c          |   8 ++
 gcc/tree-pass.h         |   1 +
 gcc/tree-ssa-forwprop.c |   6 ++
 gcc/tree-vect-generic.c | 226 ++++++++++++++++++++++++++++++++++++++--
 10 files changed, 338 insertions(+), 154 deletions(-)

diff --git a/gcc/expr.c b/gcc/expr.c
index b97c217e86d..d6cecd0f251 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -9200,17 +9200,8 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
       if (temp != 0)
 	return temp;
 
-      /* For vector MIN <x, y>, expand it a VEC_COND_EXPR <x <= y, x, y>
-	 and similarly for MAX <x, y>.  */
       if (VECTOR_TYPE_P (type))
-	{
-	  tree t0 = make_tree (type, op0);
-	  tree t1 = make_tree (type, op1);
-	  tree comparison = build2 (code == MIN_EXPR ? LE_EXPR : GE_EXPR,
-				    type, t0, t1);
-	  return expand_vec_cond_expr (type, comparison, t0, t1,
-				       original_target);
-	}
+	gcc_unreachable ();
 
       /* At this point, a MEM target is no longer useful; we will get better
 	 code without it.  */
@@ -9799,10 +9790,6 @@  expand_expr_real_2 (sepops ops, rtx target, machine_mode tmode,
 	return temp;
       }
 
-    case VEC_COND_EXPR:
-      target = expand_vec_cond_expr (type, treeop0, treeop1, treeop2, target);
-      return target;
-
     case VEC_DUPLICATE_EXPR:
       op0 = expand_expr (treeop0, NULL_RTX, VOIDmode, modifier);
       target = expand_vector_broadcast (mode, op0);
@@ -12133,8 +12120,7 @@  do_store_flag (sepops ops, rtx target, machine_mode mode)
   STRIP_NOPS (arg1);
 
   /* For vector typed comparisons emit code to generate the desired
-     all-ones or all-zeros mask.  Conveniently use the VEC_COND_EXPR
-     expander for this.  */
+     all-ones or all-zeros mask.  */
   if (TREE_CODE (ops->type) == VECTOR_TYPE)
     {
       tree ifexp = build2 (ops->code, ops->type, arg0, arg1);
@@ -12142,12 +12128,7 @@  do_store_flag (sepops ops, rtx target, machine_mode mode)
 	  && expand_vec_cmp_expr_p (TREE_TYPE (arg0), ops->type, ops->code))
 	return expand_vec_cmp_expr (ops->type, ifexp, target);
       else
-	{
-	  tree if_true = constant_boolean_node (true, ops->type);
-	  tree if_false = constant_boolean_node (false, ops->type);
-	  return expand_vec_cond_expr (ops->type, ifexp, if_true,
-				       if_false, target);
-	}
+	gcc_unreachable ();
     }
 
   /* Optimize (x % C1) == C2 or (x % C1) != C2 if it is beneficial
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 52d1638917a..827bd5aa0d2 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-fn.c
@@ -49,6 +49,7 @@  along with GCC; see the file COPYING3.  If not see
 #include "gimple-ssa.h"
 #include "tree-phinodes.h"
 #include "ssa-iterators.h"
+#include "explow.h"
 
 /* The names of each internal function, indexed by function number.  */
 const char *const internal_fn_name_array[] = {
@@ -107,6 +108,10 @@  init_internal_fns ()
 #define mask_store_direct { 3, 2, false }
 #define store_lanes_direct { 0, 0, false }
 #define mask_store_lanes_direct { 0, 0, false }
+#define vec_cond_mask_direct { 0, 0, false }
+#define vec_cond_direct { 0, 0, false }
+#define vec_condu_direct { 0, 0, false }
+#define vec_condeq_direct { 0, 0, false }
 #define scatter_store_direct { 3, 1, false }
 #define unary_direct { 0, 0, true }
 #define binary_direct { 0, 0, true }
@@ -2544,6 +2549,86 @@  expand_mask_store_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
 
 #define expand_mask_store_lanes_optab_fn expand_mask_store_optab_fn
 
+/* Expand VCOND, VCONDU and VCONDEQ optab internal functions.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[6];
+  insn_code icode;
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0a = gimple_call_arg (stmt, 0);
+  tree op0b = gimple_call_arg (stmt, 1);
+  tree op1 = gimple_call_arg (stmt, 2);
+  tree op2 = gimple_call_arg (stmt, 3);
+  enum tree_code tcode = (tree_code) int_cst_value (gimple_call_arg (stmt, 4));
+
+  tree vec_cond_type = TREE_TYPE (lhs);
+  tree op_mode = TREE_TYPE (op0a);
+  bool unsignedp = TYPE_UNSIGNED (op_mode);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode cmp_op_mode = TYPE_MODE (op_mode);
+
+  icode = convert_optab_handler (optab, mode, cmp_op_mode);
+  rtx comparison
+    = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp, icode, 4);
+  rtx rtx_op1 = expand_normal (op1);
+  rtx rtx_op2 = expand_normal (op2);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_fixed_operand (&ops[3], comparison);
+  create_fixed_operand (&ops[4], XEXP (comparison, 0));
+  create_fixed_operand (&ops[5], XEXP (comparison, 1));
+  expand_insn (icode, 6, ops);
+}
+
+#define expand_vec_cond_optab_fn expand_vect_cond_optab_fn
+#define expand_vec_condu_optab_fn expand_vect_cond_optab_fn
+#define expand_vec_condeq_optab_fn expand_vect_cond_optab_fn
+
+/* Expand VCOND_MASK optab internal function.
+   The expansion of STMT happens based on OPTAB table associated.  */
+
+static void
+expand_vect_cond_mask_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
+{
+  class expand_operand ops[4];
+
+  tree lhs = gimple_call_lhs (stmt);
+  tree op0 = gimple_call_arg (stmt, 0);
+  tree op1 = gimple_call_arg (stmt, 1);
+  tree op2 = gimple_call_arg (stmt, 2);
+  tree vec_cond_type = TREE_TYPE (lhs);
+
+  machine_mode mode = TYPE_MODE (vec_cond_type);
+  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
+  enum insn_code icode = convert_optab_handler (optab, mode, mask_mode);
+  rtx mask, rtx_op1, rtx_op2;
+
+  gcc_assert (icode != CODE_FOR_nothing);
+
+  mask = expand_normal (op0);
+  rtx_op1 = expand_normal (op1);
+  rtx_op2 = expand_normal (op2);
+
+  mask = force_reg (mask_mode, mask);
+  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
+
+  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
+  create_output_operand (&ops[0], target, mode);
+  create_input_operand (&ops[1], rtx_op1, mode);
+  create_input_operand (&ops[2], rtx_op2, mode);
+  create_input_operand (&ops[3], mask, mask_mode);
+  expand_insn (icode, 4, ops);
+}
+
+#define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
+
 static void
 expand_ABNORMAL_DISPATCHER (internal_fn, gcall *)
 {
@@ -3125,6 +3210,10 @@  multi_vector_optab_supported_p (convert_optab optab, tree_pair types,
 #define direct_mask_store_optab_supported_p direct_optab_supported_p
 #define direct_store_lanes_optab_supported_p multi_vector_optab_supported_p
 #define direct_mask_store_lanes_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_cond_mask_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_cond_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_condu_optab_supported_p multi_vector_optab_supported_p
+#define direct_vec_condeq_optab_supported_p multi_vector_optab_supported_p
 #define direct_scatter_store_optab_supported_p convert_optab_supported_p
 #define direct_while_optab_supported_p convert_optab_supported_p
 #define direct_fold_extract_optab_supported_p direct_optab_supported_p
diff --git a/gcc/internal-fn.def b/gcc/internal-fn.def
index 1d190d492ff..0c6fc371190 100644
--- a/gcc/internal-fn.def
+++ b/gcc/internal-fn.def
@@ -136,6 +136,11 @@  DEF_INTERNAL_OPTAB_FN (STORE_LANES, ECF_CONST, vec_store_lanes, store_lanes)
 DEF_INTERNAL_OPTAB_FN (MASK_STORE_LANES, 0,
 		       vec_mask_store_lanes, mask_store_lanes)
 
+DEF_INTERNAL_OPTAB_FN (VCOND, 0, vcond, vec_cond)
+DEF_INTERNAL_OPTAB_FN (VCONDU, 0, vcondu, vec_condu)
+DEF_INTERNAL_OPTAB_FN (VCONDEQ, 0, vcondeq, vec_condeq)
+DEF_INTERNAL_OPTAB_FN (VCOND_MASK, 0, vcond_mask, vec_cond_mask)
+
 DEF_INTERNAL_OPTAB_FN (WHILE_ULT, ECF_CONST | ECF_NOTHROW, while_ult, while)
 DEF_INTERNAL_OPTAB_FN (CHECK_RAW_PTRS, ECF_CONST | ECF_NOTHROW,
 		       check_raw_ptrs, check_ptrs)
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 8dd351286cd..c66c08e7d55 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -5439,7 +5439,7 @@  get_rtx_code (enum tree_code tcode, bool unsignedp)
    first comparison operand for insn ICODE.  Do not generate the
    compare instruction itself.  */
 
-static rtx
+rtx
 vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
 		    tree t_op0, tree t_op1, bool unsignedp,
 		    enum insn_code icode, unsigned int opno)
@@ -5804,128 +5804,6 @@  expand_vec_perm_var (machine_mode mode, rtx v0, rtx v1, rtx sel, rtx target)
   return tmp;
 }
 
-/* Generate insns for a VEC_COND_EXPR with mask, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_mask_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-			   rtx target)
-{
-  class expand_operand ops[4];
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode mask_mode = TYPE_MODE (TREE_TYPE (op0));
-  enum insn_code icode = get_vcond_mask_icode (mode, mask_mode);
-  rtx mask, rtx_op1, rtx_op2;
-
-  if (icode == CODE_FOR_nothing)
-    return 0;
-
-  mask = expand_normal (op0);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  mask = force_reg (mask_mode, mask);
-  rtx_op1 = force_reg (GET_MODE (rtx_op1), rtx_op1);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_input_operand (&ops[3], mask, mask_mode);
-  expand_insn (icode, 4, ops);
-
-  return ops[0].value;
-}
-
-/* Generate insns for a VEC_COND_EXPR, given its TYPE and its
-   three operands.  */
-
-rtx
-expand_vec_cond_expr (tree vec_cond_type, tree op0, tree op1, tree op2,
-		      rtx target)
-{
-  class expand_operand ops[6];
-  enum insn_code icode;
-  rtx comparison, rtx_op1, rtx_op2;
-  machine_mode mode = TYPE_MODE (vec_cond_type);
-  machine_mode cmp_op_mode;
-  bool unsignedp;
-  tree op0a, op0b;
-  enum tree_code tcode;
-
-  if (COMPARISON_CLASS_P (op0))
-    {
-      op0a = TREE_OPERAND (op0, 0);
-      op0b = TREE_OPERAND (op0, 1);
-      tcode = TREE_CODE (op0);
-    }
-  else
-    {
-      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
-      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
-	  != CODE_FOR_nothing)
-	return expand_vec_cond_mask_expr (vec_cond_type, op0, op1,
-					  op2, target);
-      /* Fake op0 < 0.  */
-      else
-	{
-	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
-		      == MODE_VECTOR_INT);
-	  op0a = op0;
-	  op0b = build_zero_cst (TREE_TYPE (op0));
-	  tcode = LT_EXPR;
-	}
-    }
-  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
-  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
-
-
-  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
-	      && known_eq (GET_MODE_NUNITS (mode),
-			   GET_MODE_NUNITS (cmp_op_mode)));
-
-  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
-  if (icode == CODE_FOR_nothing)
-    {
-      if (tcode == LT_EXPR
-	  && op0a == op0
-	  && TREE_CODE (op0) == VECTOR_CST)
-	{
-	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
-	     into a constant when only get_vcond_eq_icode is supported.
-	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
-	  unsigned HOST_WIDE_INT nelts;
-	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
-	    {
-	      if (VECTOR_CST_STEPPED_P (op0))
-		return 0;
-	      nelts = vector_cst_encoded_nelts (op0);
-	    }
-	  for (unsigned int i = 0; i < nelts; ++i)
-	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
-	      return 0;
-	  tcode = NE_EXPR;
-	}
-      if (tcode == EQ_EXPR || tcode == NE_EXPR)
-	icode = get_vcond_eq_icode (mode, cmp_op_mode);
-      if (icode == CODE_FOR_nothing)
-	return 0;
-    }
-
-  comparison = vector_compare_rtx (VOIDmode, tcode, op0a, op0b, unsignedp,
-				   icode, 4);
-  rtx_op1 = expand_normal (op1);
-  rtx_op2 = expand_normal (op2);
-
-  create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[1], rtx_op1, mode);
-  create_input_operand (&ops[2], rtx_op2, mode);
-  create_fixed_operand (&ops[3], comparison);
-  create_fixed_operand (&ops[4], XEXP (comparison, 0));
-  create_fixed_operand (&ops[5], XEXP (comparison, 1));
-  expand_insn (icode, 6, ops);
-  return ops[0].value;
-}
-
 /* Generate VEC_SERIES_EXPR <OP0, OP1>, returning a value of mode VMODE.
    Use TARGET for the result if nonnull and convenient.  */
 
diff --git a/gcc/optabs.h b/gcc/optabs.h
index 5bd19503a0a..7c2ec257cb0 100644
--- a/gcc/optabs.h
+++ b/gcc/optabs.h
@@ -321,9 +321,6 @@  extern rtx expand_vec_perm_const (machine_mode, rtx, rtx,
 /* Generate code for vector comparison.  */
 extern rtx expand_vec_cmp_expr (tree, tree, rtx);
 
-/* Generate code for VEC_COND_EXPR.  */
-extern rtx expand_vec_cond_expr (tree, tree, tree, tree, rtx);
-
 /* Generate code for VEC_SERIES_EXPR.  */
 extern rtx expand_vec_series_expr (machine_mode, rtx, rtx, rtx);
 
@@ -364,5 +361,9 @@  extern void expand_jump_insn (enum insn_code icode, unsigned int nops,
 			      class expand_operand *ops);
 
 extern enum rtx_code get_rtx_code (enum tree_code tcode, bool unsignedp);
+extern rtx vector_compare_rtx (machine_mode cmp_mode, enum tree_code tcode,
+			       tree t_op0, tree t_op1, bool unsignedp,
+			       enum insn_code icode, unsigned int opno);
+
 
 #endif /* GCC_OPTABS_H */
diff --git a/gcc/passes.def b/gcc/passes.def
index 2bf2cb78fc5..d654e5ee9fe 100644
--- a/gcc/passes.def
+++ b/gcc/passes.def
@@ -397,6 +397,7 @@  along with GCC; see the file COPYING3.  If not see
   NEXT_PASS (pass_cleanup_eh);
   NEXT_PASS (pass_lower_resx);
   NEXT_PASS (pass_nrv);
+  NEXT_PASS (pass_gimple_isel);
   NEXT_PASS (pass_cleanup_cfg_post_optimizing);
   NEXT_PASS (pass_warn_function_noreturn);
   NEXT_PASS (pass_gen_hsail);
diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
index f7b817d94e6..7154f436bb8 100644
--- a/gcc/tree-cfg.c
+++ b/gcc/tree-cfg.c
@@ -4197,6 +4197,14 @@  verify_gimple_assign_ternary (gassign *stmt)
 	  debug_generic_expr (rhs1_type);
 	  return true;
 	}
+      else if (cfun->curr_properties & PROP_gimple_lvec
+	       && TREE_CODE_CLASS (TREE_CODE (rhs1)) == tcc_comparison)
+	{
+	  error ("the first argument of %<VEC_COND_EXPR%> cannot be "
+		 "a %<GENERIC%> tree comparison expression");
+	  debug_generic_expr (rhs1);
+	  return true;
+	}
       /* Fallthrough.  */
     case COND_EXPR:
       if (!is_gimple_val (rhs1)
diff --git a/gcc/tree-pass.h b/gcc/tree-pass.h
index a1207a20a3c..490bc9702be 100644
--- a/gcc/tree-pass.h
+++ b/gcc/tree-pass.h
@@ -625,6 +625,7 @@  extern gimple_opt_pass *make_pass_local_fn_summary (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_update_address_taken (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_convert_switch (gcc::context *ctxt);
 extern gimple_opt_pass *make_pass_lower_vaarg (gcc::context *ctxt);
+extern gimple_opt_pass *make_pass_gimple_isel (gcc::context *ctxt);
 
 /* Current optimization pass.  */
 extern opt_pass *current_pass;
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 234c1f7dd7d..ce8537a58a7 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -3057,6 +3057,12 @@  pass_forwprop::execute (function *fun)
 		    if (code == COND_EXPR
 			|| code == VEC_COND_EXPR)
 		      {
+			/* Do not propagate into VEC_COND_EXPRs after they are
+			   vector lowering pass.  */
+			if (code == VEC_COND_EXPR
+			    && (fun->curr_properties & PROP_gimple_lvec))
+			  break;
+
 			/* In this case the entire COND_EXPR is in rhs1. */
 			if (forward_propagate_into_cond (&gsi))
 			  {
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 2f6fd5e980c..587faf7eb6e 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -691,12 +691,14 @@  expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 	  if (addend == NULL_TREE
 	      && expand_vec_cond_expr_p (type, type, LT_EXPR))
 	    {
-	      tree zero, cst, cond, mask_type;
-	      gimple *stmt;
+	      tree zero, cst, mask_type, mask;
+	      gimple *stmt, *cond;
 
 	      mask_type = truth_type_for (type);
 	      zero = build_zero_cst (type);
-	      cond = build2 (LT_EXPR, mask_type, op0, zero);
+	      mask = make_ssa_name (mask_type);
+	      cond = gimple_build_assign (mask, LT_EXPR, op0, zero);
+	      gsi_insert_before (gsi, cond, GSI_SAME_STMT);
 	      tree_vector_builder vec (type, nunits, 1);
 	      for (i = 0; i < nunits; i++)
 		vec.quick_push (build_int_cst (TREE_TYPE (type),
@@ -704,8 +706,8 @@  expand_vector_divmod (gimple_stmt_iterator *gsi, tree type, tree op0,
 						<< shifts[i]) - 1));
 	      cst = vec.build ();
 	      addend = make_ssa_name (type);
-	      stmt = gimple_build_assign (addend, VEC_COND_EXPR, cond,
-					  cst, zero);
+	      stmt
+		= gimple_build_assign (addend, VEC_COND_EXPR, mask, cst, zero);
 	      gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 	    }
 	}
@@ -944,7 +946,17 @@  expand_vector_condition (gimple_stmt_iterator *gsi)
     }
 
   if (expand_vec_cond_expr_p (type, TREE_TYPE (a1), TREE_CODE (a)))
-    return;
+    {
+      if (a_is_comparison)
+	{
+	  a = gimplify_build2 (gsi, TREE_CODE (a), TREE_TYPE (a), a1, a2);
+	  gimple_assign_set_rhs1 (stmt, a);
+	  update_stmt (stmt);
+	  return;
+	}
+      gcc_assert (TREE_CODE (a) == SSA_NAME || TREE_CODE (a) == VECTOR_CST);
+      return;
+    }
 
   /* Handle vector boolean types with bitmasks.  If there is a comparison
      and we can expand the comparison into the vector boolean bitmask,
@@ -2224,6 +2236,165 @@  expand_vector_operations (void)
   return cfg_changed ? TODO_cleanup_cfg : 0;
 }
 
+/* Expand all VEC_COND_EXPR gimple assignments into calls to internal
+   function based on type of selected expansion.  */
+
+static gimple *
+gimple_expand_vec_cond_expr (gimple_stmt_iterator *gsi)
+{
+  tree lhs, op0a = NULL_TREE, op0b = NULL_TREE;
+  enum tree_code code;
+  enum tree_code tcode;
+  machine_mode cmp_op_mode;
+  bool unsignedp;
+  enum insn_code icode;
+  imm_use_iterator imm_iter;
+
+  /* Only consider code == GIMPLE_ASSIGN.  */
+  gassign *stmt = dyn_cast<gassign *> (gsi_stmt (*gsi));
+  if (!stmt)
+    return NULL;
+
+  code = gimple_assign_rhs_code (stmt);
+  if (code != VEC_COND_EXPR)
+    return NULL;
+
+  tree op0 = gimple_assign_rhs1 (stmt);
+  tree op1 = gimple_assign_rhs2 (stmt);
+  tree op2 = gimple_assign_rhs3 (stmt);
+  lhs = gimple_assign_lhs (stmt);
+  machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
+
+  gcc_assert (!COMPARISON_CLASS_P (op0));
+  if (TREE_CODE (op0) == SSA_NAME)
+    {
+      unsigned int used_vec_cond_exprs = 0;
+      gimple *use_stmt;
+      FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, op0)
+	{
+	  gassign *assign = dyn_cast<gassign *> (use_stmt);
+	  if (assign != NULL && gimple_assign_rhs_code (assign) == VEC_COND_EXPR
+	      && gimple_assign_rhs1 (assign) == op0)
+	    used_vec_cond_exprs++;
+	}
+
+      gassign *def_stmt = dyn_cast<gassign *> (SSA_NAME_DEF_STMT (op0));
+      if (def_stmt)
+	{
+	  tcode = gimple_assign_rhs_code (def_stmt);
+	  op0a = gimple_assign_rhs1 (def_stmt);
+	  op0b = gimple_assign_rhs2 (def_stmt);
+
+	  tree op0a_type = TREE_TYPE (op0a);
+	  if (used_vec_cond_exprs >= 2
+	      && (get_vcond_mask_icode (mode, TYPE_MODE (op0a_type))
+		  != CODE_FOR_nothing)
+	      && expand_vec_cmp_expr_p (op0a_type, TREE_TYPE (lhs), tcode))
+	    {
+	      /* Keep the SSA name and use vcond_mask.  */
+	      tcode = TREE_CODE (op0);
+	    }
+	}
+      else
+	tcode = TREE_CODE (op0);
+    }
+  else
+    tcode = TREE_CODE (op0);
+
+  if (TREE_CODE_CLASS (tcode) != tcc_comparison)
+    {
+      gcc_assert (VECTOR_BOOLEAN_TYPE_P (TREE_TYPE (op0)));
+      if (get_vcond_mask_icode (mode, TYPE_MODE (TREE_TYPE (op0)))
+	  != CODE_FOR_nothing)
+	return gimple_build_call_internal (IFN_VCOND_MASK, 3, op0, op1, op2);
+      /* Fake op0 < 0.  */
+      else
+	{
+	  gcc_assert (GET_MODE_CLASS (TYPE_MODE (TREE_TYPE (op0)))
+		      == MODE_VECTOR_INT);
+	  op0a = op0;
+	  op0b = build_zero_cst (TREE_TYPE (op0));
+	  tcode = LT_EXPR;
+	}
+    }
+  cmp_op_mode = TYPE_MODE (TREE_TYPE (op0a));
+  unsignedp = TYPE_UNSIGNED (TREE_TYPE (op0a));
+
+
+  gcc_assert (known_eq (GET_MODE_SIZE (mode), GET_MODE_SIZE (cmp_op_mode))
+	      && known_eq (GET_MODE_NUNITS (mode),
+			   GET_MODE_NUNITS (cmp_op_mode)));
+
+  icode = get_vcond_icode (mode, cmp_op_mode, unsignedp);
+  if (icode == CODE_FOR_nothing)
+    {
+      if (tcode == LT_EXPR
+	  && op0a == op0
+	  && TREE_CODE (op0) == VECTOR_CST)
+	{
+	  /* A VEC_COND_EXPR condition could be folded from EQ_EXPR/NE_EXPR
+	     into a constant when only get_vcond_eq_icode is supported.
+	     Verify < 0 and != 0 behave the same and change it to NE_EXPR.  */
+	  unsigned HOST_WIDE_INT nelts;
+	  if (!VECTOR_CST_NELTS (op0).is_constant (&nelts))
+	    {
+	      if (VECTOR_CST_STEPPED_P (op0))
+		gcc_unreachable ();
+	      nelts = vector_cst_encoded_nelts (op0);
+	    }
+	  for (unsigned int i = 0; i < nelts; ++i)
+	    if (tree_int_cst_sgn (vector_cst_elt (op0, i)) == 1)
+	      gcc_unreachable ();
+	  tcode = NE_EXPR;
+	}
+      if (tcode == EQ_EXPR || tcode == NE_EXPR)
+	{
+	  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+	  return gimple_build_call_internal (IFN_VCONDEQ, 5, op0a, op0b, op1,
+					     op2, tcode_tree);
+	}
+    }
+
+  gcc_assert (icode != CODE_FOR_nothing);
+  tree tcode_tree = build_int_cst (integer_type_node, tcode);
+  return gimple_build_call_internal (unsignedp ? IFN_VCONDU : IFN_VCOND,
+				     5, op0a, op0b, op1, op2, tcode_tree);
+}
+
+/* Iterate all gimple statements and try to expand
+   VEC_COND_EXPR assignments.  */
+
+static unsigned int
+gimple_expand_vec_cond_exprs (void)
+{
+  gimple_stmt_iterator gsi;
+  basic_block bb;
+  bool cfg_changed = false;
+
+  FOR_EACH_BB_FN (bb, cfun)
+    {
+      for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+	{
+	  gimple *g = gimple_expand_vec_cond_expr (&gsi);
+	  if (g != NULL)
+	    {
+	      tree lhs = gimple_assign_lhs (gsi_stmt (gsi));
+	      gimple_set_lhs (g, lhs);
+	      gsi_replace (&gsi, g, false);
+	    }
+	  /* ???  If we do not cleanup EH then we will ICE in
+	     verification.  But in reality we have created wrong-code
+	     as we did not properly transition EH info and edges to
+	     the piecewise computations.  */
+	  if (maybe_clean_eh_stmt (gsi_stmt (gsi))
+	      && gimple_purge_dead_eh_edges (bb))
+	    cfg_changed = true;
+	}
+    }
+
+  return cfg_changed ? TODO_cleanup_cfg : 0;
+}
+
 namespace {
 
 const pass_data pass_data_lower_vector =
@@ -2307,4 +2478,47 @@  make_pass_lower_vector_ssa (gcc::context *ctxt)
   return new pass_lower_vector_ssa (ctxt);
 }
 
+namespace {
+
+const pass_data pass_data_gimple_isel =
+{
+  GIMPLE_PASS, /* type */
+  "isel", /* name */
+  OPTGROUP_VEC, /* optinfo_flags */
+  TV_NONE, /* tv_id */
+  PROP_cfg, /* properties_required */
+  0, /* properties_provided */
+  0, /* properties_destroyed */
+  0, /* todo_flags_start */
+  TODO_update_ssa, /* todo_flags_finish */
+};
+
+class pass_gimple_isel : public gimple_opt_pass
+{
+public:
+  pass_gimple_isel (gcc::context *ctxt)
+    : gimple_opt_pass (pass_data_gimple_isel, ctxt)
+  {}
+
+  /* opt_pass methods: */
+  virtual bool gate (function *)
+    {
+      return true;
+    }
+
+  virtual unsigned int execute (function *)
+    {
+      return gimple_expand_vec_cond_exprs ();
+    }
+
+}; // class pass_gimple_isel
+
+} // anon namespace
+
+gimple_opt_pass *
+make_pass_gimple_isel (gcc::context *ctxt)
+{
+  return new pass_gimple_isel (ctxt);
+}
+
 #include "gt-tree-vect-generic.h"
-- 
2.26.0