[Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.

Message ID 20180723165606.GA7949@arm.com
State New
Headers show
Series
  • [Arm] Fix subreg crash in different way by enabling the FP16 pattern unconditionally.
Related show

Commit Message

Tamar Christina July 23, 2018, 4:56 p.m.
Hi All,

My previous patch changed arm_can_change_mode_class to allow subregs of
64bit registers on arm big-endian.  However it seems that we can't do this
because a the data in 64 bit VFP registers are stored in little-endian order,
even on big-endian.

Allowing this change had a knock on effect that caused GCC's no-op detection
to think that loading from the first lane on arm big-endian is a no-op.  this
because we can't describe the weird ordering we have on D registers on big-endian.

The original issue comes from the fact that the code does

... foo (... bar)
{
  return bar;
}

The expansion of the return statement causes GCC to try to return the value in
a register.  GCC will try to emit the move then, from MEM to REG (due to the SSA
temporary.).  It checks for a mov optab for this which isn't available and
then tries to do the move in bits using emit_move_multi_word.

emit_move_multi_word will split the move into sub parts, but then needs to get
the sub parts and does this using subregs, but it's told it can't do subregs!

The compiler is now stuck in an infinite loop.

The way this is worked around in the back-end is that we have move patterns in
neon.md that usually just force the register instead of checking with the
back-end. This prevents emit_move_multi_word from being needed.  However the
pattern for V4HF and V8HF were guarded by TARGET_NEON && TARGET_FP16.

I don't believe the TARGET_FP16 guard to be needed, because the pattern doesn't
actually generate code and requires another pattern for that, and a reg to reg move
should always be possible anyway. So allowing the force to register here is safe
and it allows the compiler to generate a correct error instead of ICEing in an
infinite loop.

This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without introducing
any regressions while fixing

gcc.dg/vect/vect-nop-move.c execution test
g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test
g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test
g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

Regtested on armeb-none-eabi and no regressions.
Bootstrapped on arm-none-linux-gnueabihf and no issues.


Ok for trunk?

Thanks,
Tamar

gcc/
2018-07-23  Tamar Christina  <tamar.christina@arm.com>

	PR target/84711
	* config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.
	* config/arm/neon.md (movv4hf, movv8hf): Refactored to..
	(mov<mov>): ..this and enable unconditionally.

--

Comments

Thomas Preudhomme July 25, 2018, 9:55 a.m. | #1
Hi Tamar,

On Mon, 23 Jul 2018 at 17:56, Tamar Christina <tamar.christina@arm.com> wrote:
>

> Hi All,

>

> My previous patch changed arm_can_change_mode_class to allow subregs of

> 64bit registers on arm big-endian.  However it seems that we can't do this

> because a the data in 64 bit VFP registers are stored in little-endian order,

> even on big-endian.

>

> Allowing this change had a knock on effect that caused GCC's no-op detection

> to think that loading from the first lane on arm big-endian is a no-op.  this

> because we can't describe the weird ordering we have on D registers on big-endian.

>

> The original issue comes from the fact that the code does

>

> ... foo (... bar)

> {

>   return bar;

> }

>

> The expansion of the return statement causes GCC to try to return the value in

> a register.  GCC will try to emit the move then, from MEM to REG (due to the SSA

> temporary.).  It checks for a mov optab for this which isn't available and

> then tries to do the move in bits using emit_move_multi_word.

>

> emit_move_multi_word will split the move into sub parts, but then needs to get

> the sub parts and does this using subregs, but it's told it can't do subregs!

>

> The compiler is now stuck in an infinite loop.

>

> The way this is worked around in the back-end is that we have move patterns in

> neon.md that usually just force the register instead of checking with the

> back-end. This prevents emit_move_multi_word from being needed.  However the

> pattern for V4HF and V8HF were guarded by TARGET_NEON && TARGET_FP16.

>

> I don't believe the TARGET_FP16 guard to be needed, because the pattern doesn't

> actually generate code and requires another pattern for that, and a reg to reg move

> should always be possible anyway. So allowing the force to register here is safe

> and it allows the compiler to generate a correct error instead of ICEing in an

> infinite loop.


How about subreg to subreg move? Doesn't that expand to more insns
(subreg to reg and reg to subreg)? Couldn't you improve the logic to
check that there is actually a mode change so that if there isn't
(like moving from one subreg to another) just expand to a single move?

Best regards,

Thomas

>

> This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without introducing

> any regressions while fixing

>

> gcc.dg/vect/vect-nop-move.c execution test

> g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

>

> Regtested on armeb-none-eabi and no regressions.

> Bootstrapped on arm-none-linux-gnueabihf and no issues.

>

>

> Ok for trunk?

>

> Thanks,

> Tamar

>

> gcc/

> 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

>

>         PR target/84711

>         * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.

>         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

>         (mov<mov>): ..this and enable unconditionally.

>

> --
Tamar Christina July 25, 2018, 3:28 p.m. | #2
Hi Thomas,

Thanks for the review!

> >

> > I don't believe the TARGET_FP16 guard to be needed, because the

> > pattern doesn't actually generate code and requires another pattern

> > for that, and a reg to reg move should always be possible anyway. So

> > allowing the force to register here is safe and it allows the compiler

> > to generate a correct error instead of ICEing in an infinite loop.

> 

> How about subreg to subreg move? Doesn't that expand to more insns

> (subreg to reg and reg to subreg)? Couldn't you improve the logic to check

> that there is actually a mode change so that if there isn't (like moving from

> one subreg to another) just expand to a single move?

> 


Yes, but that is not a new issue. My patch is simply removing the TARGET_FP16 restrictions and
merging two patterns that should be one using an iterator and nothing more.

The redundant mov is already there and a different issue than the ICE I'm trying to fix.

None of the code inside the expander is needed at all, the code really only has an effect on subreg
to subreg moves, as `force_reg` doesn't do anything when it's argument is already a reg.

The comment in the expander (which was already there) is wrong. The *reason* the ICE is fixed isn't
because of the `force_reg`. It's because of the mere presence of the expander itself. The expander matches the
standard mov$a optab and so this prevents emit_move_insn_1 from doing the move by subwords as it finds a pattern
that's able to do the move.

The expander however always falls through and doesn’t stop RTL generation. You could remove all the code in there and have
it properly match the *neon_mov instructions which will do the right thing later at code generation time and avoid the redundant
moves.  My guess is the original `force_reg` was copied from the other patterns like `movti` and the existing `mov<mode>`. There It makes
sense because the operands can be MEM or anything general_operand.

However the redundant moves are a different problem than what I'm trying to solve here. So I think that's another patch which requires further
testing.

Regards,
Tamar

> Best regards,

> 

> Thomas

> 

> >

> > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without

> > introducing any regressions while fixing

> >

> > gcc.dg/vect/vect-nop-move.c execution test

> > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

> >

> > Regtested on armeb-none-eabi and no regressions.

> > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> >

> >

> > Ok for trunk?

> >

> > Thanks,

> > Tamar

> >

> > gcc/

> > 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

> >

> >         PR target/84711

> >         * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.

> >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> >         (mov<mov>): ..this and enable unconditionally.

> >

> > --
Thomas Preudhomme July 26, 2018, 8:29 a.m. | #3
Hi Tamar,

On Wed, 25 Jul 2018 at 16:28, Tamar Christina <Tamar.Christina@arm.com> wrote:
>

> Hi Thomas,

>

> Thanks for the review!

>

> > >

> > > I don't believe the TARGET_FP16 guard to be needed, because the

> > > pattern doesn't actually generate code and requires another pattern

> > > for that, and a reg to reg move should always be possible anyway. So

> > > allowing the force to register here is safe and it allows the compiler

> > > to generate a correct error instead of ICEing in an infinite loop.

> >

> > How about subreg to subreg move? Doesn't that expand to more insns

> > (subreg to reg and reg to subreg)? Couldn't you improve the logic to check

> > that there is actually a mode change so that if there isn't (like moving from

> > one subreg to another) just expand to a single move?

> >

>

> Yes, but that is not a new issue. My patch is simply removing the TARGET_FP16 restrictions and

> merging two patterns that should be one using an iterator and nothing more.

>

> The redundant mov is already there and a different issue than the ICE I'm trying to fix.


It's there for movv4hf and movv6hf but your patch extends this problem
to movv2sf and movv4sf as well.

>

> None of the code inside the expander is needed at all, the code really only has an effect on subreg

> to subreg moves, as `force_reg` doesn't do anything when it's argument is already a reg.

>

> The comment in the expander (which was already there) is wrong. The *reason* the ICE is fixed isn't

> because of the `force_reg`. It's because of the mere presence of the expander itself. The expander matches the

> standard mov$a optab and so this prevents emit_move_insn_1 from doing the move by subwords as it finds a pattern

> that's able to do the move.


Could you then fix the comment in your patch as well? I hadn't
understood the force_reg was not key here. You might want to update
the following sentence from your patch description if you are going to
include it in your commit message:

The way this is worked around in the back-end is that we have move patterns in
neon.md that usually just force the register instead of checking with the
back-end.

"The way this is worked around (..) that just force the register" is
what led me to believe the force_reg was important.

>

> The expander however always falls through and doesn’t stop RTL generation. You could remove all the code in there and have

> it properly match the *neon_mov instructions which will do the right thing later at code generation time and avoid the redundant

> moves.  My guess is the original `force_reg` was copied from the other patterns like `movti` and the existing `mov<mode>`. There It makes

> sense because the operands can be MEM or anything general_operand.

>

> However the redundant moves are a different problem than what I'm trying to solve here. So I think that's another patch which requires further

> testing.


I was just thinking of restricting when does the force_reg happens but
if it can be removed completely I agree it should probably be done in
a separate patch.

Oh by the way, is there something that prevent those expander to ever
be used with a memory operand? Because the GCC internals contains the
following piece for mov standard pattern (bold marks added by me):

"Second, these patterns are not used solely in the RTL generation pass. Even
the reload pass can generate move insns to copy values from stack slots into
temporary registers. When it does so, one of the operands is a hard register
and the other is an operand that can need to be reloaded into a register.
Therefore, when given such a pair of operands, the pattern must generate RTL
which needs no reloading and needs no temporary registers—no registers other
than the operands. For example, if you support the pattern with a define_
expand, then in such a case the define_expand *mustn’t call force_reg* or any
other such function which might generate new pseudo registers."

Best regards,

Thomas

>

> Regards,

> Tamar

>

> > Best regards,

> >

> > Thomas

> >

> > >

> > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed without

> > > introducing any regressions while fixing

> > >

> > > gcc.dg/vect/vect-nop-move.c execution test

> > > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> > > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> > > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

> > >

> > > Regtested on armeb-none-eabi and no regressions.

> > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > >

> > >

> > > Ok for trunk?

> > >

> > > Thanks,

> > > Tamar

> > >

> > > gcc/

> > > 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

> > >

> > >         PR target/84711

> > >         * config/arm/arm.c (arm_can_change_mode_class): Disallow subreg.

> > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > >         (mov<mov>): ..this and enable unconditionally.

> > >

> > > --
Tamar Christina July 26, 2018, 11:01 a.m. | #4
Hi Thomas,

> -----Original Message-----

> From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> Sent: Thursday, July 26, 2018 09:29

> To: Tamar Christina <Tamar.Christina@arm.com>

> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> <Kyrylo.Tkachov@arm.com>

> Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> enabling the FP16 pattern unconditionally.

> 

> Hi Tamar,

> 

> On Wed, 25 Jul 2018 at 16:28, Tamar Christina <Tamar.Christina@arm.com>

> wrote:

> >

> > Hi Thomas,

> >

> > Thanks for the review!

> >

> > > >

> > > > I don't believe the TARGET_FP16 guard to be needed, because the

> > > > pattern doesn't actually generate code and requires another

> > > > pattern for that, and a reg to reg move should always be possible

> > > > anyway. So allowing the force to register here is safe and it

> > > > allows the compiler to generate a correct error instead of ICEing in an

> infinite loop.

> > >

> > > How about subreg to subreg move? Doesn't that expand to more insns

> > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to

> > > check that there is actually a mode change so that if there isn't

> > > (like moving from one subreg to another) just expand to a single move?

> > >

> >

> > Yes, but that is not a new issue. My patch is simply removing the

> > TARGET_FP16 restrictions and merging two patterns that should be one

> using an iterator and nothing more.

> >

> > The redundant mov is already there and a different issue than the ICE I'm

> trying to fix.

> 

> It's there for movv4hf and movv6hf but your patch extends this problem to

> movv2sf and movv4sf as well.


I don't understand how it can. My patch just replaces one pattern for V4HF and
one for V8HF with one pattern operating on VH.

;; Vector modes for 16-bit floating-point support.
(define_mode_iterator VH [V8HF V4HF])

My pattern has absolutely no effect on V2SF and V4SF or any of the other modes.

> 

> >

> > None of the code inside the expander is needed at all, the code really

> > only has an effect on subreg to subreg moves, as `force_reg` doesn't do

> anything when it's argument is already a reg.

> >

> > The comment in the expander (which was already there) is wrong. The

> > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > because of the mere presence of the expander itself. The expander

> > matches the standard mov$a optab and so this prevents

> emit_move_insn_1 from doing the move by subwords as it finds a pattern

> that's able to do the move.

> 

> Could you then fix the comment in your patch as well? I hadn't understood

> the force_reg was not key here. You might want to update the following

> sentence from your patch description if you are going to include it in your

> commit message:


I'll update the comment in the patch. The cover letter won't be included in the commit,
But it does accurately reflect the current state of affairs. The patch will do the force_reg,
It's just not the reason it works.

> 

> The way this is worked around in the back-end is that we have move

> patterns in neon.md that usually just force the register instead of checking

> with the back-end.

> 

> "The way this is worked around (..) that just force the register" is what led

> me to believe the force_reg was important.

> 

> >

> > The expander however always falls through and doesn’t stop RTL

> > generation. You could remove all the code in there and have it

> > properly match the *neon_mov instructions which will do the right

> > thing later at code generation time and avoid the redundant moves.  My

> guess is the original `force_reg` was copied from the other patterns like

> `movti` and the existing `mov<mode>`. There It makes sense because the

> operands can be MEM or anything general_operand.

> >

> > However the redundant moves are a different problem than what I'm

> > trying to solve here. So I think that's another patch which requires further

> testing.

> 

> I was just thinking of restricting when does the force_reg happens but if it

> can be removed completely I agree it should probably be done in a separate

> patch.

> 

> Oh by the way, is there something that prevent those expander to ever be

> used with a memory operand? Because the GCC internals contains the

> following piece for mov standard pattern (bold marks added by me):

> 

> "Second, these patterns are not used solely in the RTL generation pass. Even

> the reload pass can generate move insns to copy values from stack slots into

> temporary registers. When it does so, one of the operands is a hard register

> and the other is an operand that can need to be reloaded into a register.

> Therefore, when given such a pair of operands, the pattern must generate

> RTL which needs no reloading and needs no temporary registers—no

> registers other than the operands. For example, if you support the pattern

> with a define_ expand, then in such a case the define_expand *mustn’t call

> force_reg* or any other such function which might generate new pseudo

> registers."


When used during expand the operand predicates are checked, this prevents that case.
When used during reload can_create_pseudo_p () guards the actions in the patterns.

gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress && !reload_completed)

So during or after reload it would just be an empty fall through pattern, so it won't do
anything to MEM or anything else.

Regards,
Tamar

> 

> Best regards,

> 

> Thomas

> 

> >

> > Regards,

> > Tamar

> >

> > > Best regards,

> > >

> > > Thomas

> > >

> > > >

> > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed

> > > > without introducing any regressions while fixing

> > > >

> > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> > > > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> > > > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

> > > >

> > > > Regtested on armeb-none-eabi and no regressions.

> > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > >

> > > >

> > > > Ok for trunk?

> > > >

> > > > Thanks,

> > > > Tamar

> > > >

> > > > gcc/

> > > > 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

> > > >

> > > >         PR target/84711

> > > >         * config/arm/arm.c (arm_can_change_mode_class): Disallow

> subreg.

> > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > >         (mov<mov>): ..this and enable unconditionally.

> > > >

> > > > --
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index cf12aceb5fde303c9762c3d08c05a718146fe876..0a26a07dabbb9aa0501ca139b1d8be866ae04ac7 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -31512,8 +31512,8 @@ arm_can_change_mode_class (machine_mode from, machine_mode to,
 {
   if (TARGET_BIG_END
       && !(GET_MODE_SIZE (from) == 16 && GET_MODE_SIZE (to) == 8)
-      && (GET_MODE_UNIT_SIZE (from) > UNITS_PER_WORD
-	  || GET_MODE_UNIT_SIZE (to) > UNITS_PER_WORD)
+      && (GET_MODE_SIZE (from) > UNITS_PER_WORD
+	  || GET_MODE_SIZE (to) > UNITS_PER_WORD)
       && reg_classes_intersect_p (VFP_REGS, rclass))
     return false;
   return true;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 1646b2172970acaaf949ba8b77d43ec72b688d73..db53ab0b8bfe7fce6e12ee6c0ec7c1cdb0cb6d41 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -113,6 +113,13 @@
    (set_attr "thumb2_pool_range" "*,*,*,1018,*,*,*,1018,*")
    (set_attr "neg_pool_range" "*,*,*,996,*,*,*,996,*")])
 
+/* We define these mov expanders to match the standard mov$a optab to prevent
+   the mid-end from trying to do a subreg for these modes which is the most
+   inefficient way to expand the move.  Also big-endian subreg's aren't
+   allowed for a subset of modes, See TARGET_CAN_CHANGE_MODE_CLASS.
+   Without these RTL generation patterns the mid-end would attempt to take a
+   sub-reg and may ICE if it can't.  */
+
 (define_expand "movti"
   [(set (match_operand:TI 0 "nonimmediate_operand" "")
 	(match_operand:TI 1 "general_operand" ""))]
@@ -137,33 +144,15 @@
     }
 })
 
-(define_expand "movv4hf"
-  [(set (match_operand:V4HF 0 "s_register_operand")
-	(match_operand:V4HF 1 "s_register_operand"))]
-  "TARGET_NEON && TARGET_FP16"
+(define_expand "mov<mode>"
+  [(set (match_operand:VH 0 "s_register_operand")
+	(match_operand:VH 1 "s_register_operand"))]
+  "TARGET_NEON"
 {
-  /* We need to use force_reg to avoid TARGET_CAN_CHANGE_MODE_CLASS
-     causing an ICE on big-endian because it cannot extract subregs in
-     this case.  */
-  if (can_create_pseudo_p ())
-    {
-      if (!REG_P (operands[0]))
-	operands[1] = force_reg (V4HFmode, operands[1]);
-    }
-})
-
-(define_expand "movv8hf"
-  [(set (match_operand:V8HF 0 "")
-	(match_operand:V8HF 1 ""))]
-  "TARGET_NEON && TARGET_FP16"
-{ 
-  /* We need to use force_reg to avoid TARGET_CAN_CHANGE_MODE_CLASS
-     causing an ICE on big-endian because it cannot extract subregs in
-     this case.  */
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
-	operands[1] = force_reg (V8HFmode, operands[1]);
+	operands[1] = force_reg (<MODE>mode, operands[1]);
     }
 })
Thomas Preudhomme July 26, 2018, 11:28 a.m. | #5
On Thu, 26 Jul 2018 at 12:01, Tamar Christina <Tamar.Christina@arm.com> wrote:
>

> Hi Thomas,

>

> > -----Original Message-----

> > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > Sent: Thursday, July 26, 2018 09:29

> > To: Tamar Christina <Tamar.Christina@arm.com>

> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > <Kyrylo.Tkachov@arm.com>

> > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > enabling the FP16 pattern unconditionally.

> >

> > Hi Tamar,

> >

> > On Wed, 25 Jul 2018 at 16:28, Tamar Christina <Tamar.Christina@arm.com>

> > wrote:

> > >

> > > Hi Thomas,

> > >

> > > Thanks for the review!

> > >

> > > > >

> > > > > I don't believe the TARGET_FP16 guard to be needed, because the

> > > > > pattern doesn't actually generate code and requires another

> > > > > pattern for that, and a reg to reg move should always be possible

> > > > > anyway. So allowing the force to register here is safe and it

> > > > > allows the compiler to generate a correct error instead of ICEing in an

> > infinite loop.

> > > >

> > > > How about subreg to subreg move? Doesn't that expand to more insns

> > > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to

> > > > check that there is actually a mode change so that if there isn't

> > > > (like moving from one subreg to another) just expand to a single move?

> > > >

> > >

> > > Yes, but that is not a new issue. My patch is simply removing the

> > > TARGET_FP16 restrictions and merging two patterns that should be one

> > using an iterator and nothing more.

> > >

> > > The redundant mov is already there and a different issue than the ICE I'm

> > trying to fix.

> >

> > It's there for movv4hf and movv6hf but your patch extends this problem to

> > movv2sf and movv4sf as well.

>

> I don't understand how it can. My patch just replaces one pattern for V4HF and

> one for V8HF with one pattern operating on VH.

>

> ;; Vector modes for 16-bit floating-point support.

> (define_mode_iterator VH [V8HF V4HF])

>

> My pattern has absolutely no effect on V2SF and V4SF or any of the other modes.


My bad, I was looking at VF.

>

> >

> > >

> > > None of the code inside the expander is needed at all, the code really

> > > only has an effect on subreg to subreg moves, as `force_reg` doesn't do

> > anything when it's argument is already a reg.

> > >

> > > The comment in the expander (which was already there) is wrong. The

> > > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > > because of the mere presence of the expander itself. The expander

> > > matches the standard mov$a optab and so this prevents

> > emit_move_insn_1 from doing the move by subwords as it finds a pattern

> > that's able to do the move.

> >

> > Could you then fix the comment in your patch as well? I hadn't understood

> > the force_reg was not key here. You might want to update the following

> > sentence from your patch description if you are going to include it in your

> > commit message:

>

> I'll update the comment in the patch. The cover letter won't be included in the commit,

> But it does accurately reflect the current state of affairs. The patch will do the force_reg,

> It's just not the reason it works.


Understood.

>

> >

> > The way this is worked around in the back-end is that we have move

> > patterns in neon.md that usually just force the register instead of checking

> > with the back-end.

> >

> > "The way this is worked around (..) that just force the register" is what led

> > me to believe the force_reg was important.

> >

> > >

> > > The expander however always falls through and doesn’t stop RTL

> > > generation. You could remove all the code in there and have it

> > > properly match the *neon_mov instructions which will do the right

> > > thing later at code generation time and avoid the redundant moves.  My

> > guess is the original `force_reg` was copied from the other patterns like

> > `movti` and the existing `mov<mode>`. There It makes sense because the

> > operands can be MEM or anything general_operand.

> > >

> > > However the redundant moves are a different problem than what I'm

> > > trying to solve here. So I think that's another patch which requires further

> > testing.

> >

> > I was just thinking of restricting when does the force_reg happens but if it

> > can be removed completely I agree it should probably be done in a separate

> > patch.

> >

> > Oh by the way, is there something that prevent those expander to ever be

> > used with a memory operand? Because the GCC internals contains the

> > following piece for mov standard pattern (bold marks added by me):

> >

> > "Second, these patterns are not used solely in the RTL generation pass. Even

> > the reload pass can generate move insns to copy values from stack slots into

> > temporary registers. When it does so, one of the operands is a hard register

> > and the other is an operand that can need to be reloaded into a register.

> > Therefore, when given such a pair of operands, the pattern must generate

> > RTL which needs no reloading and needs no temporary registers—no

> > registers other than the operands. For example, if you support the pattern

> > with a define_ expand, then in such a case the define_expand *mustn’t call

> > force_reg* or any other such function which might generate new pseudo

> > registers."

>

> When used during expand the operand predicates are checked, this prevents that case.

> When used during reload can_create_pseudo_p () guards the actions in the patterns.

>

> gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress && !reload_completed)

>

> So during or after reload it would just be an empty fall through pattern, so it won't do

> anything to MEM or anything else.


Of course. LGTM with the mentioned changes to the comment. This is not
an approval though as I'm not a maintainer.

Best regards,

Thomas

>

> Regards,

> Tamar

>

> >

> > Best regards,

> >

> > Thomas

> >

> > >

> > > Regards,

> > > Tamar

> > >

> > > > Best regards,

> > > >

> > > > Thomas

> > > >

> > > > >

> > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed

> > > > > without introducing any regressions while fixing

> > > > >

> > > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> > > > > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> > > > > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

> > > > >

> > > > > Regtested on armeb-none-eabi and no regressions.

> > > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > > >

> > > > >

> > > > > Ok for trunk?

> > > > >

> > > > > Thanks,

> > > > > Tamar

> > > > >

> > > > > gcc/

> > > > > 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

> > > > >

> > > > >         PR target/84711

> > > > >         * config/arm/arm.c (arm_can_change_mode_class): Disallow

> > subreg.

> > > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > > >         (mov<mov>): ..this and enable unconditionally.

> > > > >

> > > > > --
Tamar Christina July 31, 2018, 9:46 a.m. | #6
Ping 😊

> -----Original Message-----

> From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> Sent: Thursday, July 26, 2018 12:29

> To: Tamar Christina <Tamar.Christina@arm.com>

> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> <Kyrylo.Tkachov@arm.com>

> Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> enabling the FP16 pattern unconditionally.

> 

> On Thu, 26 Jul 2018 at 12:01, Tamar Christina <Tamar.Christina@arm.com>

> wrote:

> >

> > Hi Thomas,

> >

> > > -----Original Message-----

> > > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > > Sent: Thursday, July 26, 2018 09:29

> > > To: Tamar Christina <Tamar.Christina@arm.com>

> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana

> Radhakrishnan

> > > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > > <Kyrylo.Tkachov@arm.com>

> > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > > enabling the FP16 pattern unconditionally.

> > >

> > > Hi Tamar,

> > >

> > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina

> > > <Tamar.Christina@arm.com>

> > > wrote:

> > > >

> > > > Hi Thomas,

> > > >

> > > > Thanks for the review!

> > > >

> > > > > >

> > > > > > I don't believe the TARGET_FP16 guard to be needed, because

> > > > > > the pattern doesn't actually generate code and requires

> > > > > > another pattern for that, and a reg to reg move should always

> > > > > > be possible anyway. So allowing the force to register here is

> > > > > > safe and it allows the compiler to generate a correct error

> > > > > > instead of ICEing in an

> > > infinite loop.

> > > > >

> > > > > How about subreg to subreg move? Doesn't that expand to more

> > > > > insns (subreg to reg and reg to subreg)? Couldn't you improve

> > > > > the logic to check that there is actually a mode change so that

> > > > > if there isn't (like moving from one subreg to another) just expand to

> a single move?

> > > > >

> > > >

> > > > Yes, but that is not a new issue. My patch is simply removing the

> > > > TARGET_FP16 restrictions and merging two patterns that should be

> > > > one

> > > using an iterator and nothing more.

> > > >

> > > > The redundant mov is already there and a different issue than the

> > > > ICE I'm

> > > trying to fix.

> > >

> > > It's there for movv4hf and movv6hf but your patch extends this

> > > problem to movv2sf and movv4sf as well.

> >

> > I don't understand how it can. My patch just replaces one pattern for

> > V4HF and one for V8HF with one pattern operating on VH.

> >

> > ;; Vector modes for 16-bit floating-point support.

> > (define_mode_iterator VH [V8HF V4HF])

> >

> > My pattern has absolutely no effect on V2SF and V4SF or any of the other

> modes.

> 

> My bad, I was looking at VF.

> 

> >

> > >

> > > >

> > > > None of the code inside the expander is needed at all, the code

> > > > really only has an effect on subreg to subreg moves, as

> > > > `force_reg` doesn't do

> > > anything when it's argument is already a reg.

> > > >

> > > > The comment in the expander (which was already there) is wrong.

> > > > The

> > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > > > because of the mere presence of the expander itself. The expander

> > > > matches the standard mov$a optab and so this prevents

> > > emit_move_insn_1 from doing the move by subwords as it finds a

> > > pattern that's able to do the move.

> > >

> > > Could you then fix the comment in your patch as well? I hadn't

> > > understood the force_reg was not key here. You might want to update

> > > the following sentence from your patch description if you are going

> > > to include it in your commit message:

> >

> > I'll update the comment in the patch. The cover letter won't be

> > included in the commit, But it does accurately reflect the current

> > state of affairs. The patch will do the force_reg, It's just not the reason it

> works.

> 

> Understood.

> 

> >

> > >

> > > The way this is worked around in the back-end is that we have move

> > > patterns in neon.md that usually just force the register instead of

> > > checking with the back-end.

> > >

> > > "The way this is worked around (..) that just force the register" is

> > > what led me to believe the force_reg was important.

> > >

> > > >

> > > > The expander however always falls through and doesn’t stop RTL

> > > > generation. You could remove all the code in there and have it

> > > > properly match the *neon_mov instructions which will do the right

> > > > thing later at code generation time and avoid the redundant moves.

> > > > My

> > > guess is the original `force_reg` was copied from the other patterns

> > > like `movti` and the existing `mov<mode>`. There It makes sense

> > > because the operands can be MEM or anything general_operand.

> > > >

> > > > However the redundant moves are a different problem than what I'm

> > > > trying to solve here. So I think that's another patch which

> > > > requires further

> > > testing.

> > >

> > > I was just thinking of restricting when does the force_reg happens

> > > but if it can be removed completely I agree it should probably be

> > > done in a separate patch.

> > >

> > > Oh by the way, is there something that prevent those expander to

> > > ever be used with a memory operand? Because the GCC internals

> > > contains the following piece for mov standard pattern (bold marks added

> by me):

> > >

> > > "Second, these patterns are not used solely in the RTL generation

> > > pass. Even the reload pass can generate move insns to copy values

> > > from stack slots into temporary registers. When it does so, one of

> > > the operands is a hard register and the other is an operand that can need

> to be reloaded into a register.

> > > Therefore, when given such a pair of operands, the pattern must

> > > generate RTL which needs no reloading and needs no temporary

> > > registers—no registers other than the operands. For example, if you

> > > support the pattern with a define_ expand, then in such a case the

> > > define_expand *mustn’t call

> > > force_reg* or any other such function which might generate new

> > > pseudo registers."

> >

> > When used during expand the operand predicates are checked, this

> prevents that case.

> > When used during reload can_create_pseudo_p () guards the actions in

> the patterns.

> >

> > gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress &&

> > !reload_completed)

> >

> > So during or after reload it would just be an empty fall through

> > pattern, so it won't do anything to MEM or anything else.

> 

> Of course. LGTM with the mentioned changes to the comment. This is not an

> approval though as I'm not a maintainer.

> 

> Best regards,

> 

> Thomas

> 

> >

> > Regards,

> > Tamar

> >

> > >

> > > Best regards,

> > >

> > > Thomas

> > >

> > > >

> > > > Regards,

> > > > Tamar

> > > >

> > > > > Best regards,

> > > > >

> > > > > Thomas

> > > > >

> > > > > >

> > > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed

> > > > > > without introducing any regressions while fixing

> > > > > >

> > > > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > > > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> > > > > > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> > > > > > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

> > > > > >

> > > > > > Regtested on armeb-none-eabi and no regressions.

> > > > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > > > >

> > > > > >

> > > > > > Ok for trunk?

> > > > > >

> > > > > > Thanks,

> > > > > > Tamar

> > > > > >

> > > > > > gcc/

> > > > > > 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

> > > > > >

> > > > > >         PR target/84711

> > > > > >         * config/arm/arm.c (arm_can_change_mode_class):

> > > > > > Disallow

> > > subreg.

> > > > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > > > >         (mov<mov>): ..this and enable unconditionally.

> > > > > >

> > > > > > --
Tamar Christina Aug. 13, 2018, 9:31 a.m. | #7
Ping

> -----Original Message-----

> From: gcc-patches-owner@gcc.gnu.org <gcc-patches-owner@gcc.gnu.org>

> On Behalf Of Tamar Christina

> Sent: Tuesday, July 31, 2018 10:47

> To: gcc-patches@gcc.gnu.org

> Cc: nd <nd@arm.com>; Ramana Radhakrishnan

> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> <Kyrylo.Tkachov@arm.com>; Thomas Preudhomme

> <thomas.preudhomme@linaro.org>

> Subject: RE: [PATCH][GCC][Arm] Fix subreg crash in different way by

> enabling the FP16 pattern unconditionally.

> 

> Ping 😊

> 

> > -----Original Message-----

> > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > Sent: Thursday, July 26, 2018 12:29

> > To: Tamar Christina <Tamar.Christina@arm.com>

> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > <Kyrylo.Tkachov@arm.com>

> > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > enabling the FP16 pattern unconditionally.

> >

> > On Thu, 26 Jul 2018 at 12:01, Tamar Christina

> > <Tamar.Christina@arm.com>

> > wrote:

> > >

> > > Hi Thomas,

> > >

> > > > -----Original Message-----

> > > > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > > > Sent: Thursday, July 26, 2018 09:29

> > > > To: Tamar Christina <Tamar.Christina@arm.com>

> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana

> > Radhakrishnan

> > > > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > > > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > > > <Kyrylo.Tkachov@arm.com>

> > > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way

> > > > by enabling the FP16 pattern unconditionally.

> > > >

> > > > Hi Tamar,

> > > >

> > > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina

> > > > <Tamar.Christina@arm.com>

> > > > wrote:

> > > > >

> > > > > Hi Thomas,

> > > > >

> > > > > Thanks for the review!

> > > > >

> > > > > > >

> > > > > > > I don't believe the TARGET_FP16 guard to be needed, because

> > > > > > > the pattern doesn't actually generate code and requires

> > > > > > > another pattern for that, and a reg to reg move should

> > > > > > > always be possible anyway. So allowing the force to register

> > > > > > > here is safe and it allows the compiler to generate a

> > > > > > > correct error instead of ICEing in an

> > > > infinite loop.

> > > > > >

> > > > > > How about subreg to subreg move? Doesn't that expand to more

> > > > > > insns (subreg to reg and reg to subreg)? Couldn't you improve

> > > > > > the logic to check that there is actually a mode change so

> > > > > > that if there isn't (like moving from one subreg to another)

> > > > > > just expand to

> > a single move?

> > > > > >

> > > > >

> > > > > Yes, but that is not a new issue. My patch is simply removing

> > > > > the

> > > > > TARGET_FP16 restrictions and merging two patterns that should be

> > > > > one

> > > > using an iterator and nothing more.

> > > > >

> > > > > The redundant mov is already there and a different issue than

> > > > > the ICE I'm

> > > > trying to fix.

> > > >

> > > > It's there for movv4hf and movv6hf but your patch extends this

> > > > problem to movv2sf and movv4sf as well.

> > >

> > > I don't understand how it can. My patch just replaces one pattern

> > > for V4HF and one for V8HF with one pattern operating on VH.

> > >

> > > ;; Vector modes for 16-bit floating-point support.

> > > (define_mode_iterator VH [V8HF V4HF])

> > >

> > > My pattern has absolutely no effect on V2SF and V4SF or any of the

> > > other

> > modes.

> >

> > My bad, I was looking at VF.

> >

> > >

> > > >

> > > > >

> > > > > None of the code inside the expander is needed at all, the code

> > > > > really only has an effect on subreg to subreg moves, as

> > > > > `force_reg` doesn't do

> > > > anything when it's argument is already a reg.

> > > > >

> > > > > The comment in the expander (which was already there) is wrong.

> > > > > The

> > > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > > > > because of the mere presence of the expander itself. The

> > > > > expander matches the standard mov$a optab and so this prevents

> > > > emit_move_insn_1 from doing the move by subwords as it finds a

> > > > pattern that's able to do the move.

> > > >

> > > > Could you then fix the comment in your patch as well? I hadn't

> > > > understood the force_reg was not key here. You might want to

> > > > update the following sentence from your patch description if you

> > > > are going to include it in your commit message:

> > >

> > > I'll update the comment in the patch. The cover letter won't be

> > > included in the commit, But it does accurately reflect the current

> > > state of affairs. The patch will do the force_reg, It's just not the

> > > reason it

> > works.

> >

> > Understood.

> >

> > >

> > > >

> > > > The way this is worked around in the back-end is that we have move

> > > > patterns in neon.md that usually just force the register instead

> > > > of checking with the back-end.

> > > >

> > > > "The way this is worked around (..) that just force the register"

> > > > is what led me to believe the force_reg was important.

> > > >

> > > > >

> > > > > The expander however always falls through and doesn’t stop RTL

> > > > > generation. You could remove all the code in there and have it

> > > > > properly match the *neon_mov instructions which will do the

> > > > > right thing later at code generation time and avoid the redundant

> moves.

> > > > > My

> > > > guess is the original `force_reg` was copied from the other

> > > > patterns like `movti` and the existing `mov<mode>`. There It makes

> > > > sense because the operands can be MEM or anything general_operand.

> > > > >

> > > > > However the redundant moves are a different problem than what

> > > > > I'm trying to solve here. So I think that's another patch which

> > > > > requires further

> > > > testing.

> > > >

> > > > I was just thinking of restricting when does the force_reg happens

> > > > but if it can be removed completely I agree it should probably be

> > > > done in a separate patch.

> > > >

> > > > Oh by the way, is there something that prevent those expander to

> > > > ever be used with a memory operand? Because the GCC internals

> > > > contains the following piece for mov standard pattern (bold marks

> > > > added

> > by me):

> > > >

> > > > "Second, these patterns are not used solely in the RTL generation

> > > > pass. Even the reload pass can generate move insns to copy values

> > > > from stack slots into temporary registers. When it does so, one of

> > > > the operands is a hard register and the other is an operand that

> > > > can need

> > to be reloaded into a register.

> > > > Therefore, when given such a pair of operands, the pattern must

> > > > generate RTL which needs no reloading and needs no temporary

> > > > registers—no registers other than the operands. For example, if

> > > > you support the pattern with a define_ expand, then in such a case

> > > > the define_expand *mustn’t call

> > > > force_reg* or any other such function which might generate new

> > > > pseudo registers."

> > >

> > > When used during expand the operand predicates are checked, this

> > prevents that case.

> > > When used during reload can_create_pseudo_p () guards the actions in

> > the patterns.

> > >

> > > gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress &&

> > > !reload_completed)

> > >

> > > So during or after reload it would just be an empty fall through

> > > pattern, so it won't do anything to MEM or anything else.

> >

> > Of course. LGTM with the mentioned changes to the comment. This is not

> > an approval though as I'm not a maintainer.

> >

> > Best regards,

> >

> > Thomas

> >

> > >

> > > Regards,

> > > Tamar

> > >

> > > >

> > > > Best regards,

> > > >

> > > > Thomas

> > > >

> > > > >

> > > > > Regards,

> > > > > Tamar

> > > > >

> > > > > > Best regards,

> > > > > >

> > > > > > Thomas

> > > > > >

> > > > > > >

> > > > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is

> > > > > > > fixed without introducing any regressions while fixing

> > > > > > >

> > > > > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > > > > g++.dg/torture/vshuf-v2si.C   -O3 -g  execution test

> > > > > > > g++.dg/torture/vshuf-v4si.C   -O3 -g  execution test

> > > > > > > g++.dg/torture/vshuf-v8hi.C   -O3 -g  execution test

> > > > > > >

> > > > > > > Regtested on armeb-none-eabi and no regressions.

> > > > > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > > > > >

> > > > > > >

> > > > > > > Ok for trunk?

> > > > > > >

> > > > > > > Thanks,

> > > > > > > Tamar

> > > > > > >

> > > > > > > gcc/

> > > > > > > 2018-07-23  Tamar Christina  <tamar.christina@arm.com>

> > > > > > >

> > > > > > >         PR target/84711

> > > > > > >         * config/arm/arm.c (arm_can_change_mode_class):

> > > > > > > Disallow

> > > > subreg.

> > > > > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > > > > >         (mov<mov>): ..this and enable unconditionally.

> > > > > > >

> > > > > > > --
Kyrill Tkachov Aug. 15, 2018, 2:25 p.m. | #8
Hi Tamar,

On 26/07/18 12:01, Tamar Christina wrote:
> Hi Thomas,

>

> > -----Original Message-----

> > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > Sent: Thursday, July 26, 2018 09:29

> > To: Tamar Christina <Tamar.Christina@arm.com>

> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > <Kyrylo.Tkachov@arm.com>

> > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > enabling the FP16 pattern unconditionally.

> >

> > Hi Tamar,

> >

> > On Wed, 25 Jul 2018 at 16:28, Tamar Christina <Tamar.Christina@arm.com>

> > wrote:

> > >

> > > Hi Thomas,

> > >

> > > Thanks for the review!

> > >

> > > > >

> > > > > I don't believe the TARGET_FP16 guard to be needed, because the

> > > > > pattern doesn't actually generate code and requires another

> > > > > pattern for that, and a reg to reg move should always be possible

> > > > > anyway. So allowing the force to register here is safe and it

> > > > > allows the compiler to generate a correct error instead of ICEing in an

> > infinite loop.

> > > >

> > > > How about subreg to subreg move? Doesn't that expand to more insns

> > > > (subreg to reg and reg to subreg)? Couldn't you improve the logic to

> > > > check that there is actually a mode change so that if there isn't

> > > > (like moving from one subreg to another) just expand to a single move?

> > > >

> > >

> > > Yes, but that is not a new issue. My patch is simply removing the

> > > TARGET_FP16 restrictions and merging two patterns that should be one

> > using an iterator and nothing more.

> > >

> > > The redundant mov is already there and a different issue than the ICE I'm

> > trying to fix.

> >

> > It's there for movv4hf and movv6hf but your patch extends this problem to

> > movv2sf and movv4sf as well.

>

> I don't understand how it can. My patch just replaces one pattern for V4HF and

> one for V8HF with one pattern operating on VH.

>

> ;; Vector modes for 16-bit floating-point support.

> (define_mode_iterator VH [V8HF V4HF])

>

> My pattern has absolutely no effect on V2SF and V4SF or any of the other modes.

>

> >

> > >

> > > None of the code inside the expander is needed at all, the code really

> > > only has an effect on subreg to subreg moves, as `force_reg` doesn't do

> > anything when it's argument is already a reg.

> > >

> > > The comment in the expander (which was already there) is wrong. The

> > > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > > because of the mere presence of the expander itself. The expander

> > > matches the standard mov$a optab and so this prevents

> > emit_move_insn_1 from doing the move by subwords as it finds a pattern

> > that's able to do the move.

> >

> > Could you then fix the comment in your patch as well? I hadn't understood

> > the force_reg was not key here. You might want to update the following

> > sentence from your patch description if you are going to include it in your

> > commit message:

>

> I'll update the comment in the patch. The cover letter won't be included in the commit,

> But it does accurately reflect the current state of affairs. The patch will do the force_reg,

> It's just not the reason it works.

>

> >

> > The way this is worked around in the back-end is that we have move

> > patterns in neon.md that usually just force the register instead of checking

> > with the back-end.

> >

> > "The way this is worked around (..) that just force the register" is what led

> > me to believe the force_reg was important.

> >

> > >

> > > The expander however always falls through and doesn’t stop RTL

> > > generation. You could remove all the code in there and have it

> > > properly match the *neon_mov instructions which will do the right

> > > thing later at code generation time and avoid the redundant moves.  My

> > guess is the original `force_reg` was copied from the other patterns like

> > `movti` and the existing `mov<mode>`. There It makes sense because the

> > operands can be MEM or anything general_operand.

> > >

> > > However the redundant moves are a different problem than what I'm

> > > trying to solve here. So I think that's another patch which requires further

> > testing.

> >

> > I was just thinking of restricting when does the force_reg happens but if it

> > can be removed completely I agree it should probably be done in a separate

> > patch.

> >

> > Oh by the way, is there something that prevent those expander to ever be

> > used with a memory operand? Because the GCC internals contains the

> > following piece for mov standard pattern (bold marks added by me):

> >

> > "Second, these patterns are not used solely in the RTL generation pass. Even

> > the reload pass can generate move insns to copy values from stack slots into

> > temporary registers. When it does so, one of the operands is a hard register

> > and the other is an operand that can need to be reloaded into a register.

> > Therefore, when given such a pair of operands, the pattern must generate

> > RTL which needs no reloading and needs no temporary registers—no

> > registers other than the operands. For example, if you support the pattern

> > with a define_ expand, then in such a case the define_expand *mustn’t call

> > force_reg* or any other such function which might generate new pseudo

> > registers."

>

> When used during expand the operand predicates are checked, this prevents that case.

> When used during reload can_create_pseudo_p () guards the actions in the patterns.

>

> gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress && !reload_completed)

>

> So during or after reload it would just be an empty fall through pattern, so it won't do

> anything to MEM or anything else.

>


This is ok for trunk.
Thanks for looking at it as well Thomas,

Kyrill

> Regards,

> Tamar

>

> >

> > Best regards,

> >

> > Thomas

> >

> > >

> > > Regards,

> > > Tamar

> > >

> > > > Best regards,

> > > >

> > > > Thomas

> > > >

> > > > >

> > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed

> > > > > without introducing any regressions while fixing

> > > > >

> > > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > > g++.dg/torture/vshuf-v2si.C   -O3 -g execution test

> > > > > g++.dg/torture/vshuf-v4si.C   -O3 -g execution test

> > > > > g++.dg/torture/vshuf-v8hi.C   -O3 -g execution test

> > > > >

> > > > > Regtested on armeb-none-eabi and no regressions.

> > > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > > >

> > > > >

> > > > > Ok for trunk?

> > > > >

> > > > > Thanks,

> > > > > Tamar

> > > > >

> > > > > gcc/

> > > > > 2018-07-23  Tamar Christina <tamar.christina@arm.com>

> > > > >

> > > > >         PR target/84711

> > > > >         * config/arm/arm.c (arm_can_change_mode_class): Disallow

> > subreg.

> > > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > > >         (mov<mov>): ..this and enable unconditionally.

> > > > >

> > > > > --
Tamar Christina Sept. 18, 2018, 12:55 p.m. | #9
Hi All,

I'm looking for a backport of this patch to GCC8.

Ok for backport?

Thanks,
Tamar

> -----Original Message-----

> From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>

> Sent: Wednesday, August 15, 2018 15:25

> To: Tamar Christina <Tamar.Christina@arm.com>; Thomas Preudhomme

> <thomas.preudhomme@linaro.org>

> Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>; nickc@redhat.com

> Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> enabling the FP16 pattern unconditionally.

> 

> Hi Tamar,

> 

> On 26/07/18 12:01, Tamar Christina wrote:

> > Hi Thomas,

> >

> > > -----Original Message-----

> > > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > > Sent: Thursday, July 26, 2018 09:29

> > > To: Tamar Christina <Tamar.Christina@arm.com>

> > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana

> Radhakrishnan

> > > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > > <Kyrylo.Tkachov@arm.com>

> > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > > enabling the FP16 pattern unconditionally.

> > >

> > > Hi Tamar,

> > >

> > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina

> > > <Tamar.Christina@arm.com>

> > > wrote:

> > > >

> > > > Hi Thomas,

> > > >

> > > > Thanks for the review!

> > > >

> > > > > >

> > > > > > I don't believe the TARGET_FP16 guard to be needed, because

> > > > > > the pattern doesn't actually generate code and requires

> > > > > > another pattern for that, and a reg to reg move should always

> > > > > > be possible anyway. So allowing the force to register here is

> > > > > > safe and it allows the compiler to generate a correct error

> > > > > > instead of ICEing in an

> > > infinite loop.

> > > > >

> > > > > How about subreg to subreg move? Doesn't that expand to more

> > > > > insns (subreg to reg and reg to subreg)? Couldn't you improve

> > > > > the logic to check that there is actually a mode change so that

> > > > > if there isn't (like moving from one subreg to another) just expand to

> a single move?

> > > > >

> > > >

> > > > Yes, but that is not a new issue. My patch is simply removing the

> > > > TARGET_FP16 restrictions and merging two patterns that should be

> > > > one

> > > using an iterator and nothing more.

> > > >

> > > > The redundant mov is already there and a different issue than the

> > > > ICE I'm

> > > trying to fix.

> > >

> > > It's there for movv4hf and movv6hf but your patch extends this

> > > problem to movv2sf and movv4sf as well.

> >

> > I don't understand how it can. My patch just replaces one pattern for

> > V4HF and one for V8HF with one pattern operating on VH.

> >

> > ;; Vector modes for 16-bit floating-point support.

> > (define_mode_iterator VH [V8HF V4HF])

> >

> > My pattern has absolutely no effect on V2SF and V4SF or any of the other

> modes.

> >

> > >

> > > >

> > > > None of the code inside the expander is needed at all, the code

> > > > really only has an effect on subreg to subreg moves, as

> > > > `force_reg` doesn't do

> > > anything when it's argument is already a reg.

> > > >

> > > > The comment in the expander (which was already there) is wrong.

> > > > The

> > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > > > because of the mere presence of the expander itself. The expander

> > > > matches the standard mov$a optab and so this prevents

> > > emit_move_insn_1 from doing the move by subwords as it finds a

> > > pattern that's able to do the move.

> > >

> > > Could you then fix the comment in your patch as well? I hadn't

> > > understood the force_reg was not key here. You might want to update

> > > the following sentence from your patch description if you are going

> > > to include it in your commit message:

> >

> > I'll update the comment in the patch. The cover letter won't be

> > included in the commit, But it does accurately reflect the current

> > state of affairs. The patch will do the force_reg, It's just not the reason it

> works.

> >

> > >

> > > The way this is worked around in the back-end is that we have move

> > > patterns in neon.md that usually just force the register instead of

> > > checking with the back-end.

> > >

> > > "The way this is worked around (..) that just force the register" is

> > > what led me to believe the force_reg was important.

> > >

> > > >

> > > > The expander however always falls through and doesn’t stop RTL

> > > > generation. You could remove all the code in there and have it

> > > > properly match the *neon_mov instructions which will do the right

> > > > thing later at code generation time and avoid the redundant moves.

> > > > My

> > > guess is the original `force_reg` was copied from the other patterns

> > > like `movti` and the existing `mov<mode>`. There It makes sense

> > > because the operands can be MEM or anything general_operand.

> > > >

> > > > However the redundant moves are a different problem than what I'm

> > > > trying to solve here. So I think that's another patch which

> > > > requires further

> > > testing.

> > >

> > > I was just thinking of restricting when does the force_reg happens

> > > but if it can be removed completely I agree it should probably be

> > > done in a separate patch.

> > >

> > > Oh by the way, is there something that prevent those expander to

> > > ever be used with a memory operand? Because the GCC internals

> > > contains the following piece for mov standard pattern (bold marks added

> by me):

> > >

> > > "Second, these patterns are not used solely in the RTL generation

> > > pass. Even the reload pass can generate move insns to copy values

> > > from stack slots into temporary registers. When it does so, one of

> > > the operands is a hard register and the other is an operand that can need

> to be reloaded into a register.

> > > Therefore, when given such a pair of operands, the pattern must

> > > generate RTL which needs no reloading and needs no temporary

> > > registers—no registers other than the operands. For example, if you

> > > support the pattern with a define_ expand, then in such a case the

> > > define_expand *mustn’t call

> > > force_reg* or any other such function which might generate new

> > > pseudo registers."

> >

> > When used during expand the operand predicates are checked, this

> prevents that case.

> > When used during reload can_create_pseudo_p () guards the actions in

> the patterns.

> >

> > gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress &&

> > !reload_completed)

> >

> > So during or after reload it would just be an empty fall through

> > pattern, so it won't do anything to MEM or anything else.

> >

> 

> This is ok for trunk.

> Thanks for looking at it as well Thomas,

> 

> Kyrill

> 

> > Regards,

> > Tamar

> >

> > >

> > > Best regards,

> > >

> > > Thomas

> > >

> > > >

> > > > Regards,

> > > > Tamar

> > > >

> > > > > Best regards,

> > > > >

> > > > > Thomas

> > > > >

> > > > > >

> > > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed

> > > > > > without introducing any regressions while fixing

> > > > > >

> > > > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > > > g++.dg/torture/vshuf-v2si.C   -O3 -g execution test

> > > > > > g++.dg/torture/vshuf-v4si.C   -O3 -g execution test

> > > > > > g++.dg/torture/vshuf-v8hi.C   -O3 -g execution test

> > > > > >

> > > > > > Regtested on armeb-none-eabi and no regressions.

> > > > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > > > >

> > > > > >

> > > > > > Ok for trunk?

> > > > > >

> > > > > > Thanks,

> > > > > > Tamar

> > > > > >

> > > > > > gcc/

> > > > > > 2018-07-23  Tamar Christina <tamar.christina@arm.com>

> > > > > >

> > > > > >         PR target/84711

> > > > > >         * config/arm/arm.c (arm_can_change_mode_class):

> > > > > > Disallow

> > > subreg.

> > > > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > > > >         (mov<mov>): ..this and enable unconditionally.

> > > > > >

> > > > > > --
Kyrill Tkachov Sept. 18, 2018, 1:18 p.m. | #10
Hi Tamar,

On 18/09/18 13:55, Tamar Christina wrote:
> Hi All,

>

> I'm looking for a backport of this patch to GCC8.

>

> Ok for backport?

>


Ok if testing on that branch shows no problems.

Thanks,
Kyrill

> Thanks,

> Tamar

>

> > -----Original Message-----

> > From: Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>

> > Sent: Wednesday, August 15, 2018 15:25

> > To: Tamar Christina <Tamar.Christina@arm.com>; Thomas Preudhomme

> > <thomas.preudhomme@linaro.org>

> > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana Radhakrishnan

> > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > <Richard.Earnshaw@arm.com>; nickc@redhat.com

> > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > enabling the FP16 pattern unconditionally.

> >

> > Hi Tamar,

> >

> > On 26/07/18 12:01, Tamar Christina wrote:

> > > Hi Thomas,

> > >

> > > > -----Original Message-----

> > > > From: Thomas Preudhomme <thomas.preudhomme@linaro.org>

> > > > Sent: Thursday, July 26, 2018 09:29

> > > > To: Tamar Christina <Tamar.Christina@arm.com>

> > > > Cc: gcc-patches@gcc.gnu.org; nd <nd@arm.com>; Ramana

> > Radhakrishnan

> > > > <Ramana.Radhakrishnan@arm.com>; Richard Earnshaw

> > > > <Richard.Earnshaw@arm.com>; nickc@redhat.com; Kyrylo Tkachov

> > > > <Kyrylo.Tkachov@arm.com>

> > > > Subject: Re: [PATCH][GCC][Arm] Fix subreg crash in different way by

> > > > enabling the FP16 pattern unconditionally.

> > > >

> > > > Hi Tamar,

> > > >

> > > > On Wed, 25 Jul 2018 at 16:28, Tamar Christina

> > > > <Tamar.Christina@arm.com>

> > > > wrote:

> > > > >

> > > > > Hi Thomas,

> > > > >

> > > > > Thanks for the review!

> > > > >

> > > > > > >

> > > > > > > I don't believe the TARGET_FP16 guard to be needed, because

> > > > > > > the pattern doesn't actually generate code and requires

> > > > > > > another pattern for that, and a reg to reg move should always

> > > > > > > be possible anyway. So allowing the force to register here is

> > > > > > > safe and it allows the compiler to generate a correct error

> > > > > > > instead of ICEing in an

> > > > infinite loop.

> > > > > >

> > > > > > How about subreg to subreg move? Doesn't that expand to more

> > > > > > insns (subreg to reg and reg to subreg)? Couldn't you improve

> > > > > > the logic to check that there is actually a mode change so that

> > > > > > if there isn't (like moving from one subreg to another) just expand to

> > a single move?

> > > > > >

> > > > >

> > > > > Yes, but that is not a new issue. My patch is simply removing the

> > > > > TARGET_FP16 restrictions and merging two patterns that should be

> > > > > one

> > > > using an iterator and nothing more.

> > > > >

> > > > > The redundant mov is already there and a different issue than the

> > > > > ICE I'm

> > > > trying to fix.

> > > >

> > > > It's there for movv4hf and movv6hf but your patch extends this

> > > > problem to movv2sf and movv4sf as well.

> > >

> > > I don't understand how it can. My patch just replaces one pattern for

> > > V4HF and one for V8HF with one pattern operating on VH.

> > >

> > > ;; Vector modes for 16-bit floating-point support.

> > > (define_mode_iterator VH [V8HF V4HF])

> > >

> > > My pattern has absolutely no effect on V2SF and V4SF or any of the other

> > modes.

> > >

> > > >

> > > > >

> > > > > None of the code inside the expander is needed at all, the code

> > > > > really only has an effect on subreg to subreg moves, as

> > > > > `force_reg` doesn't do

> > > > anything when it's argument is already a reg.

> > > > >

> > > > > The comment in the expander (which was already there) is wrong.

> > > > > The

> > > > > *reason* the ICE is fixed isn't because of the `force_reg`. It's

> > > > > because of the mere presence of the expander itself. The expander

> > > > > matches the standard mov$a optab and so this prevents

> > > > emit_move_insn_1 from doing the move by subwords as it finds a

> > > > pattern that's able to do the move.

> > > >

> > > > Could you then fix the comment in your patch as well? I hadn't

> > > > understood the force_reg was not key here. You might want to update

> > > > the following sentence from your patch description if you are going

> > > > to include it in your commit message:

> > >

> > > I'll update the comment in the patch. The cover letter won't be

> > > included in the commit, But it does accurately reflect the current

> > > state of affairs. The patch will do the force_reg, It's just not the reason it

> > works.

> > >

> > > >

> > > > The way this is worked around in the back-end is that we have move

> > > > patterns in neon.md that usually just force the register instead of

> > > > checking with the back-end.

> > > >

> > > > "The way this is worked around (..) that just force the register" is

> > > > what led me to believe the force_reg was important.

> > > >

> > > > >

> > > > > The expander however always falls through and doesn’t stop RTL

> > > > > generation. You could remove all the code in there and have it

> > > > > properly match the *neon_mov instructions which will do the right

> > > > > thing later at code generation time and avoid the redundant moves.

> > > > > My

> > > > guess is the original `force_reg` was copied from the other patterns

> > > > like `movti` and the existing `mov<mode>`. There It makes sense

> > > > because the operands can be MEM or anything general_operand.

> > > > >

> > > > > However the redundant moves are a different problem than what I'm

> > > > > trying to solve here. So I think that's another patch which

> > > > > requires further

> > > > testing.

> > > >

> > > > I was just thinking of restricting when does the force_reg happens

> > > > but if it can be removed completely I agree it should probably be

> > > > done in a separate patch.

> > > >

> > > > Oh by the way, is there something that prevent those expander to

> > > > ever be used with a memory operand? Because the GCC internals

> > > > contains the following piece for mov standard pattern (bold marks added

> > by me):

> > > >

> > > > "Second, these patterns are not used solely in the RTL generation

> > > > pass. Even the reload pass can generate move insns to copy values

> > > > from stack slots into temporary registers. When it does so, one of

> > > > the operands is a hard register and the other is an operand that can need

> > to be reloaded into a register.

> > > > Therefore, when given such a pair of operands, the pattern must

> > > > generate RTL which needs no reloading and needs no temporary

> > > > registers—no registers other than the operands. For example, if you

> > > > support the pattern with a define_ expand, then in such a case the

> > > > define_expand *mustn’t call

> > > > force_reg* or any other such function which might generate new

> > > > pseudo registers."

> > >

> > > When used during expand the operand predicates are checked, this

> > prevents that case.

> > > When used during reload can_create_pseudo_p () guards the actions in

> > the patterns.

> > >

> > > gcc/rtl.h:#define can_create_pseudo_p() (!reload_in_progress &&

> > > !reload_completed)

> > >

> > > So during or after reload it would just be an empty fall through

> > > pattern, so it won't do anything to MEM or anything else.

> > >

> >

> > This is ok for trunk.

> > Thanks for looking at it as well Thomas,

> >

> > Kyrill

> >

> > > Regards,

> > > Tamar

> > >

> > > >

> > > > Best regards,

> > > >

> > > > Thomas

> > > >

> > > > >

> > > > > Regards,

> > > > > Tamar

> > > > >

> > > > > > Best regards,

> > > > > >

> > > > > > Thomas

> > > > > >

> > > > > > >

> > > > > > > This patch ensures gcc.target/arm/big-endian-subreg.c is fixed

> > > > > > > without introducing any regressions while fixing

> > > > > > >

> > > > > > > gcc.dg/vect/vect-nop-move.c execution test

> > > > > > > g++.dg/torture/vshuf-v2si.C -O3 -g execution test

> > > > > > > g++.dg/torture/vshuf-v4si.C -O3 -g execution test

> > > > > > > g++.dg/torture/vshuf-v8hi.C -O3 -g execution test

> > > > > > >

> > > > > > > Regtested on armeb-none-eabi and no regressions.

> > > > > > > Bootstrapped on arm-none-linux-gnueabihf and no issues.

> > > > > > >

> > > > > > >

> > > > > > > Ok for trunk?

> > > > > > >

> > > > > > > Thanks,

> > > > > > > Tamar

> > > > > > >

> > > > > > > gcc/

> > > > > > > 2018-07-23  Tamar Christina <tamar.christina@arm.com>

> > > > > > >

> > > > > > >         PR target/84711

> > > > > > >         * config/arm/arm.c (arm_can_change_mode_class):

> > > > > > > Disallow

> > > > subreg.

> > > > > > >         * config/arm/neon.md (movv4hf, movv8hf): Refactored to..

> > > > > > >         (mov<mov>): ..this and enable unconditionally.

> > > > > > >

> > > > > > > --

>

Patch

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index ec3abbcba9fb1ed040bc9cdcf162f5bb4f8df586..8d5897c8f0f93502fb958cba61b7c0b1570d1570 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -31512,8 +31512,8 @@  arm_can_change_mode_class (machine_mode from, machine_mode to,
 {
   if (TARGET_BIG_END
       && !(GET_MODE_SIZE (from) == 16 && GET_MODE_SIZE (to) == 8)
-      && (GET_MODE_UNIT_SIZE (from) > UNITS_PER_WORD
-	  || GET_MODE_UNIT_SIZE (to) > UNITS_PER_WORD)
+      && (GET_MODE_SIZE (from) > UNITS_PER_WORD
+	  || GET_MODE_SIZE (to) > UNITS_PER_WORD)
       && reg_classes_intersect_p (VFP_REGS, rclass))
     return false;
   return true;
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 1646b2172970acaaf949ba8b77d43ec72b688d73..57df17fe18dfeaa82e890fd339e2104ad27ee13b 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -137,10 +137,10 @@ 
     }
 })
 
-(define_expand "movv4hf"
-  [(set (match_operand:V4HF 0 "s_register_operand")
-	(match_operand:V4HF 1 "s_register_operand"))]
-  "TARGET_NEON && TARGET_FP16"
+(define_expand "mov<mode>"
+  [(set (match_operand:VH 0 "s_register_operand")
+	(match_operand:VH 1 "s_register_operand"))]
+  "TARGET_NEON"
 {
   /* We need to use force_reg to avoid TARGET_CAN_CHANGE_MODE_CLASS
      causing an ICE on big-endian because it cannot extract subregs in
@@ -148,22 +148,7 @@ 
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
-	operands[1] = force_reg (V4HFmode, operands[1]);
-    }
-})
-
-(define_expand "movv8hf"
-  [(set (match_operand:V8HF 0 "")
-	(match_operand:V8HF 1 ""))]
-  "TARGET_NEON && TARGET_FP16"
-{ 
-  /* We need to use force_reg to avoid TARGET_CAN_CHANGE_MODE_CLASS
-     causing an ICE on big-endian because it cannot extract subregs in
-     this case.  */
-  if (can_create_pseudo_p ())
-    {
-      if (!REG_P (operands[0]))
-	operands[1] = force_reg (V8HFmode, operands[1]);
+	operands[1] = force_reg (<MODE>mode, operands[1]);
     }
 })