[ARM] : Fix for MVE ACLE intrinsics with writeback (PR94317).

Message ID AM0PR08MB538007365371A43722463E699BC80@AM0PR08MB5380.eurprd08.prod.outlook.com
State New
Headers show
Series
  • [ARM] : Fix for MVE ACLE intrinsics with writeback (PR94317).
Related show

Commit Message

Srinath Parvathaneni March 31, 2020, 4:13 p.m.
Hello,

Following MVE ACLE intrinsics have an issue with writeback to the base address.

vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64, vldrdq_gather_base_wb_z_s64,
vldrdq_gather_base_wb_z_u64, vldrwq_gather_base_wb_s32, vldrwq_gather_base_wb_u32,
vldrwq_gather_base_wb_z_s32, vldrwq_gather_base_wb_z_u32, vldrwq_gather_base_wb_f32,
vldrwq_gather_base_wb_z_f32.

This patch fixes the bug reported in PR94317 by adding separate builtin calls to update
the result and writeback to base address for the above intrinsics.

Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more details.
[1] https://developer.arm.com/architectures/instruction-sets/simd-isas/helium/mve-intrinsics

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath.

gcc/ChangeLog:

2020-03-31  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	PR target/94317
	* config/arm/arm-builtins.c (LDRGBWBXU_QUALIFIERS): Define.
	(LDRGBWBXU_Z_QUALIFIERS): Likewise.
	* config/arm/arm_mve.h (__arm_vldrdq_gather_base_wb_s64): Modify
	intrinsic defintion by adding a new builtin call to writeback into base
	address.
	(__arm_vldrdq_gather_base_wb_u64): Likewise.
	(__arm_vldrdq_gather_base_wb_z_s64): Likewise.
	(__arm_vldrdq_gather_base_wb_z_u64): Likewise.
	(__arm_vldrwq_gather_base_wb_s32): Likewise.
	(__arm_vldrwq_gather_base_wb_u32): Likewise.
	(__arm_vldrwq_gather_base_wb_z_s32): Likewise.
	(__arm_vldrwq_gather_base_wb_z_u32): Likewise.
	(__arm_vldrwq_gather_base_wb_f32): Likewise.
	(__arm_vldrwq_gather_base_wb_z_f32): Likewise.
	* config/arm/arm_mve_builtins.def (vldrwq_gather_base_wb_z_u): Modify
	builtin's qualifier.
	(vldrdq_gather_base_wb_z_u): Likewise.
	(vldrwq_gather_base_wb_u): Likewise.
	(vldrdq_gather_base_wb_u): Likewise.
	(vldrwq_gather_base_wb_z_s): Likewise.
	(vldrwq_gather_base_wb_z_f): Likewise.
	(vldrdq_gather_base_wb_z_s): Likewise.
	(vldrwq_gather_base_wb_s): Likewise.
	(vldrwq_gather_base_wb_f): Likewise.
	(vldrdq_gather_base_wb_s): Likewise.
	(vldrwq_gather_base_nowb_z_u): Define builtin.
	(vldrdq_gather_base_nowb_z_u): Likewise.
	(vldrwq_gather_base_nowb_u): Likewise.
	(vldrdq_gather_base_nowb_u): Likewise.
	(vldrwq_gather_base_nowb_z_s): Likewise.
	(vldrwq_gather_base_nowb_z_f): Likewise.
	(vldrdq_gather_base_nowb_z_s): Likewise.
	(vldrwq_gather_base_nowb_s): Likewise.
	(vldrwq_gather_base_nowb_f): Likewise.
	(vldrdq_gather_base_nowb_s): Likewise.
	* config/arm/mve.md (mve_vldrwq_gather_base_nowb_<supf>v4si): Define RTL
	pattern.
	(mve_vldrwq_gather_base_wb_<supf>v4si): Modify RTL pattern.
	(mve_vldrwq_gather_base_nowb_z_<supf>v4si): Define RTL pattern.
	(mve_vldrwq_gather_base_wb_z_<supf>v4si): Modify RTL pattern.
	(mve_vldrwq_gather_base_wb_fv4sf): Modify RTL pattern.
	(mve_vldrwq_gather_base_nowb_fv4sf): Define RTL pattern.
	(mve_vldrwq_gather_base_wb_z_fv4sf): Modify RTL pattern.
	(mve_vldrwq_gather_base_nowb_z_fv4sf): Define RTL pattern.
	(mve_vldrdq_gather_base_nowb_<supf>v4di): Define RTL pattern.
	(mve_vldrdq_gather_base_wb_<supf>v4di):  Modify RTL pattern.
	(mve_vldrdq_gather_base_nowb_z_<supf>v4di): Define RTL pattern.
	(mve_vldrdq_gather_base_wb_z_<supf>v4di):  Modify RTL pattern.

gcc/testsuite/ChangeLog:

2020-03-31  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	PR target/94317
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c: Modify
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c: Likewise.
	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c: Likewise.



###############     Attachment also inlined for ease of reply    ###############
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 56f0db21ea95dcd738877daba27f1cb60f0d5a32..832b9107424fd9a4a0ee272b773b3d0929172370 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -719,6 +719,17 @@ arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
+arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
+#define LDRGBWBXU_QUALIFIERS (arm_ldrgbwbxu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
+
+static enum arm_type_qualifiers
 arm_ldrgbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate};
 #define LDRGBWBS_QUALIFIERS (arm_ldrgbwbs_qualifiers)
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f1dcdc2153217e796c58526ba0e5be11be642234..47a6268e0800958f49d46238fe34ec749d243929 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -13903,8 +13903,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_s64 (uint64x2_t * __addr, const int __offset)
 {
   int64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_sv2di (*__addr, __offset);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset);
   return result;
 }
 
@@ -13913,8 +13913,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_u64 (uint64x2_t * __addr, const int __offset)
 {
   uint64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_uv2di (*__addr, __offset);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset);
   return result;
 }
 
@@ -13923,8 +13923,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_z_s64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p)
 {
   int64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_z_sv2di (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p);
   return result;
 }
 
@@ -13933,8 +13933,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_z_u64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p)
 {
   uint64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_z_uv2di (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p);
   return result;
 }
 
@@ -13943,8 +13943,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_s32 (uint32x4_t * __addr, const int __offset)
 {
   int32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_sv4si (*__addr, __offset);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset);
   return result;
 }
 
@@ -13953,8 +13953,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_u32 (uint32x4_t * __addr, const int __offset)
 {
   uint32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_uv4si (*__addr, __offset);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset);
   return result;
 }
 
@@ -13963,8 +13963,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_z_s32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
 {
   int32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_z_sv4si (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p);
   return result;
 }
 
@@ -13973,8 +13973,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_z_u32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
 {
   uint32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_z_uv4si (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p);
   return result;
 }
 
@@ -19372,8 +19372,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset)
 {
   float32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_fv4sf (*__addr, __offset);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset);
   return result;
 }
 
@@ -19382,8 +19382,8 @@ __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_z_f32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
 {
   float32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_z_fv4sf (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p);
   return result;
 }
 
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 2fb975944b9fdac9de4b5a1bec3962be410637f1..753e40a951d071c1ab77476a1cc4779e91689178 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -847,16 +847,26 @@ VAR1 (STRSBWBS, vstrdq_scatter_base_wb_s, v2di)
 VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_s, v4si)
 VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_f, v4sf)
 VAR1 (STRSBWBS_P, vstrdq_scatter_base_wb_p_s, v2di)
-VAR1 (LDRGBWBU_Z, vldrwq_gather_base_wb_z_u, v4si)
-VAR1 (LDRGBWBU_Z, vldrdq_gather_base_wb_z_u, v2di)
-VAR1 (LDRGBWBU, vldrwq_gather_base_wb_u, v4si)
-VAR1 (LDRGBWBU, vldrdq_gather_base_wb_u, v2di)
-VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_s, v4si)
-VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_f, v4sf)
-VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di)
-VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si)
-VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf)
-VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di)
+VAR1 (LDRGBWBU_Z, vldrwq_gather_base_nowb_z_u, v4si)
+VAR1 (LDRGBWBU_Z, vldrdq_gather_base_nowb_z_u, v2di)
+VAR1 (LDRGBWBU, vldrwq_gather_base_nowb_u, v4si)
+VAR1 (LDRGBWBU, vldrdq_gather_base_nowb_u, v2di)
+VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_s, v4si)
+VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_f, v4sf)
+VAR1 (LDRGBWBS_Z, vldrdq_gather_base_nowb_z_s, v2di)
+VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_s, v4si)
+VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_f, v4sf)
+VAR1 (LDRGBWBS, vldrdq_gather_base_nowb_s, v2di)
+VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_s, v2di)
+VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_u, v2di)
+VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_s, v2di)
+VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_u, v2di)
+VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_s, v4si)
+VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_f, v4sf)
+VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_u, v4si)
+VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_s, v4si)
+VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_f, v4sf)
+VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_u, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si)
 VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index df602b07840bb4ccb9aa2a9b10992ba7078452ba..d1028f4542b4972b4080e46544c86d625d77383a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10420,6 +10420,20 @@
    (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (ignore_result, operands[0],
+						 operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vldrwq_gather_base_nowb_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V4SImode);
   emit_insn (
   gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (operands[0], ignore_wb,
@@ -10459,6 +10473,21 @@
    (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (ignore_result, operands[0],
+						   operands[1], operands[2],
+						   operands[3]));
+  DONE;
+})
+(define_expand "mve_vldrwq_gather_base_nowb_z_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V4SImode);
   emit_insn (
   gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (operands[0], ignore_wb,
@@ -10487,12 +10516,26 @@
    ops[0] = operands[0];
    ops[1] = operands[2];
    ops[2] = operands[3];
-   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);
+   output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops);
    return "";
 }
   [(set_attr "length" "8")])
 
 (define_expand "mve_vldrwq_gather_base_wb_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_result = gen_reg_rtx (V4SFmode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_fv4sf_insn (ignore_result, operands[0],
+					    operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vldrwq_gather_base_nowb_fv4sf"
   [(match_operand:V4SF 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
@@ -10531,6 +10574,22 @@
   [(set_attr "length" "4")])
 
 (define_expand "mve_vldrwq_gather_base_wb_z_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_result = gen_reg_rtx (V4SFmode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_z_fv4sf_insn (ignore_result, operands[0],
+					      operands[1], operands[2],
+					      operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vldrwq_gather_base_nowb_z_fv4sf"
   [(match_operand:V4SF 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
@@ -10566,7 +10625,7 @@
    ops[0] = operands[0];
    ops[1] = operands[2];
    ops[2] = operands[3];
-   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);
+   output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -10578,6 +10637,20 @@
    (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (ignore_result, operands[0],
+						 operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vldrdq_gather_base_nowb_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand")
+   (match_operand:V2DI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V2DImode);
   emit_insn (
   gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (operands[0], ignore_wb,
@@ -10585,6 +10658,7 @@
   DONE;
 })
 
+
 ;;
 ;; [vldrdq_gather_base_wb_s vldrdq_gather_base_wb_u]
 ;;
@@ -10617,6 +10691,22 @@
    (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (ignore_result, operands[0],
+						   operands[1], operands[2],
+						   operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vldrdq_gather_base_nowb_z_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand")
+   (match_operand:V2DI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V2DImode);
   emit_insn (
   gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (operands[0], ignore_wb,
@@ -10660,7 +10750,7 @@
    ops[0] = operands[0];
    ops[1] = operands[2];
    ops[2] = operands[3];
-   output_asm_insn ("vpst\;\tvldrdt.u64\t%q0, [%q1, %2]!",ops);
+   output_asm_insn ("vpst\;vldrdt.u64\t%q0, [%q1, %2]!",ops);
    return "";
 }
   [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
index a5c5a61345cb0a46abc7796ceff195698cabe804..0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
@@ -10,4 +10,6 @@ foo (uint64x2_t * addr)
   return vldrdq_gather_base_wb_s64 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrd.64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
index 442bca92a43c05124717bf6ea0c44672941091f0..cb2a41bdcd32b553a93d3bcc4787d506f1b54f74 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
@@ -10,4 +10,6 @@ foo (uint64x2_t * addr)
   return vldrdq_gather_base_wb_u64 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrd.64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
index 1863d0835e12328b7b7bb824f59e3d441042f56d..243fbeacc3429025202da2ff157ade38a472e123 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
@@ -8,4 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
     return vldrdq_gather_base_wb_z_s64 (addr, 1016, p);
 }
 
-/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
index 7ba272a112607b0e57a3d4659e5b4033044af83c..10ba42405fe8fde9d4f8993b20e41a59c7bb2e77 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
@@ -8,4 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
     return vldrdq_gather_base_wb_z_u64 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
index 6b496873f173e30414ffcddf50513758bc8ca770..db8108e37325c4e1fafd2293d48eba0c33309073 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
@@ -10,4 +10,6 @@ foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_f32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
index 9bbbd0d701546b5ec224129aef49e632addea550..3da64e218e2c0789e996be551650033567eba4e5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
@@ -10,4 +10,6 @@ foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_s32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
index 774230b290367a7d28f0c8579be26fc9c75db1cb..2597ee11608bfe21d697f2250bee7e69c0cc7aec 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
@@ -10,4 +10,6 @@ foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_u32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
index 6400f014a88ccf34fef15effff65f9b1267dbd5f..f1ba63855be254d96806c163177e32856294c106 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
@@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_f32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\tP0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
index de7006c51f17665b80b83fd5ea034477b7a7e778..56da5a46c64d2946ceade8689105048e19efdc6a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
@@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_s32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
index 6c9608f07ba966876804f56403a4352a51a0e0c4..63165d97c1a7b4120be036348a09b73afddd36d1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
@@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_u32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */

Comments

Kyrylo Tkachov April 2, 2020, 9:58 a.m. | #1
Hi Srinath,

> -----Original Message-----

> From: Srinath Parvathaneni <Srinath.Parvathaneni@arm.com>

> Sent: 31 March 2020 17:13

> To: gcc-patches@gcc.gnu.org

> Cc: Kyrylo Tkachov <Kyrylo.Tkachov@arm.com>; Richard Earnshaw

> <Richard.Earnshaw@arm.com>

> Subject: [GCC][PATCH][ARM]: Fix for MVE ACLE intrinsics with writeback

> (PR94317).

> 

> Hello,

> 

> Following MVE ACLE intrinsics have an issue with writeback to the base

> address.

> 

> vldrdq_gather_base_wb_s64, vldrdq_gather_base_wb_u64,

> vldrdq_gather_base_wb_z_s64, vldrdq_gather_base_wb_z_u64,

> vldrwq_gather_base_wb_s32, vldrwq_gather_base_wb_u32,

> vldrwq_gather_base_wb_z_s32, vldrwq_gather_base_wb_z_u32,

> vldrwq_gather_base_wb_f32, vldrwq_gather_base_wb_z_f32.

> 

> This patch fixes the bug reported in PR94317 by adding separate builtin calls

> to update the result and writeback to base address for the above intrinsics.

> 

> Please refer to M-profile Vector Extension (MVE) intrinsics [1]  for more

> details.

> [1] https://developer.arm.com/architectures/instruction-sets/simd-

> isas/helium/mve-intrinsics

> 

> Regression tested on arm-none-eabi and found no regressions.

> 

> Ok for trunk?


Thanks, I've pushed this patch to master.
Kyrill

> 

> Thanks,

> Srinath.

> 

> gcc/ChangeLog:

> 

> 2020-03-31  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

> 

> 	PR target/94317

> 	* config/arm/arm-builtins.c (LDRGBWBXU_QUALIFIERS): Define.

> 	(LDRGBWBXU_Z_QUALIFIERS): Likewise.

> 	* config/arm/arm_mve.h (__arm_vldrdq_gather_base_wb_s64):

> Modify

> 	intrinsic defintion by adding a new builtin call to writeback into base

> 	address.

> 	(__arm_vldrdq_gather_base_wb_u64): Likewise.

> 	(__arm_vldrdq_gather_base_wb_z_s64): Likewise.

> 	(__arm_vldrdq_gather_base_wb_z_u64): Likewise.

> 	(__arm_vldrwq_gather_base_wb_s32): Likewise.

> 	(__arm_vldrwq_gather_base_wb_u32): Likewise.

> 	(__arm_vldrwq_gather_base_wb_z_s32): Likewise.

> 	(__arm_vldrwq_gather_base_wb_z_u32): Likewise.

> 	(__arm_vldrwq_gather_base_wb_f32): Likewise.

> 	(__arm_vldrwq_gather_base_wb_z_f32): Likewise.

> 	* config/arm/arm_mve_builtins.def (vldrwq_gather_base_wb_z_u):

> Modify

> 	builtin's qualifier.

> 	(vldrdq_gather_base_wb_z_u): Likewise.

> 	(vldrwq_gather_base_wb_u): Likewise.

> 	(vldrdq_gather_base_wb_u): Likewise.

> 	(vldrwq_gather_base_wb_z_s): Likewise.

> 	(vldrwq_gather_base_wb_z_f): Likewise.

> 	(vldrdq_gather_base_wb_z_s): Likewise.

> 	(vldrwq_gather_base_wb_s): Likewise.

> 	(vldrwq_gather_base_wb_f): Likewise.

> 	(vldrdq_gather_base_wb_s): Likewise.

> 	(vldrwq_gather_base_nowb_z_u): Define builtin.

> 	(vldrdq_gather_base_nowb_z_u): Likewise.

> 	(vldrwq_gather_base_nowb_u): Likewise.

> 	(vldrdq_gather_base_nowb_u): Likewise.

> 	(vldrwq_gather_base_nowb_z_s): Likewise.

> 	(vldrwq_gather_base_nowb_z_f): Likewise.

> 	(vldrdq_gather_base_nowb_z_s): Likewise.

> 	(vldrwq_gather_base_nowb_s): Likewise.

> 	(vldrwq_gather_base_nowb_f): Likewise.

> 	(vldrdq_gather_base_nowb_s): Likewise.

> 	* config/arm/mve.md (mve_vldrwq_gather_base_nowb_<supf>v4si):

> Define RTL

> 	pattern.

> 	(mve_vldrwq_gather_base_wb_<supf>v4si): Modify RTL pattern.

> 	(mve_vldrwq_gather_base_nowb_z_<supf>v4si): Define RTL pattern.

> 	(mve_vldrwq_gather_base_wb_z_<supf>v4si): Modify RTL pattern.

> 	(mve_vldrwq_gather_base_wb_fv4sf): Modify RTL pattern.

> 	(mve_vldrwq_gather_base_nowb_fv4sf): Define RTL pattern.

> 	(mve_vldrwq_gather_base_wb_z_fv4sf): Modify RTL pattern.

> 	(mve_vldrwq_gather_base_nowb_z_fv4sf): Define RTL pattern.

> 	(mve_vldrdq_gather_base_nowb_<supf>v4di): Define RTL pattern.

> 	(mve_vldrdq_gather_base_wb_<supf>v4di):  Modify RTL pattern.

> 	(mve_vldrdq_gather_base_nowb_z_<supf>v4di): Define RTL pattern.

> 	(mve_vldrdq_gather_base_wb_z_<supf>v4di):  Modify RTL pattern.

> 

> gcc/testsuite/ChangeLog:

> 

> 2020-03-31  Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

> 

> 	PR target/94317

> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c:

> Modify

> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c:

> Likewise.

> 	* gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c:

> Likewise.

> 

> 

> 

> ###############     Attachment also inlined for ease of reply

> ###############

> 

> 

> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c

> index

> 56f0db21ea95dcd738877daba27f1cb60f0d5a32..832b9107424fd9a4a0ee272

> b773b3d0929172370 100644

> --- a/gcc/config/arm/arm-builtins.c

> +++ b/gcc/config/arm/arm-builtins.c

> @@ -719,6 +719,17 @@

> arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_M

> AX_BUILTIN_ARGS]

>    (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers)

> 

>  static enum arm_type_qualifiers

> +arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]

> +  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};

> +#define LDRGBWBXU_QUALIFIERS (arm_ldrgbwbxu_qualifiers)

> +

> +static enum arm_type_qualifiers

> +arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]

> +  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,

> +      qualifier_unsigned};

> +#define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)

> +

> +static enum arm_type_qualifiers

>  arm_ldrgbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]

>    = { qualifier_none, qualifier_unsigned, qualifier_immediate};  #define

> LDRGBWBS_QUALIFIERS (arm_ldrgbwbs_qualifiers) diff --git

> a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h index

> f1dcdc2153217e796c58526ba0e5be11be642234..47a6268e0800958f49d4623

> 8fe34ec749d243929 100644

> --- a/gcc/config/arm/arm_mve.h

> +++ b/gcc/config/arm/arm_mve.h

> @@ -13903,8 +13903,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrdq_gather_base_wb_s64 (uint64x2_t * __addr, const int __offset)

> {

>    int64x2_t

> -  result = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrdq_gather_base_nowb_sv2di (*__addr,

> + __offset);  *__addr = __builtin_mve_vldrdq_gather_base_wb_sv2di

> + (*__addr, __offset);

>    return result;

>  }

> 

> @@ -13913,8 +13913,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrdq_gather_base_wb_u64 (uint64x2_t * __addr, const int

> __offset)  {

>    uint64x2_t

> -  result = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrdq_gather_base_nowb_uv2di (*__addr,

> + __offset);  *__addr = __builtin_mve_vldrdq_gather_base_wb_uv2di

> + (*__addr, __offset);

>    return result;

>  }

> 

> @@ -13923,8 +13923,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrdq_gather_base_wb_z_s64 (uint64x2_t * __addr, const int

> __offset, mve_pred16_t __p)  {

>    int64x2_t

> -  result = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset,

> __p);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrdq_gather_base_nowb_z_sv2di (*__addr,

> + __offset, __p);  *__addr = __builtin_mve_vldrdq_gather_base_wb_z_sv2di

> + (*__addr, __offset, __p);

>    return result;

>  }

> 

> @@ -13933,8 +13933,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrdq_gather_base_wb_z_u64 (uint64x2_t * __addr, const int

> __offset, mve_pred16_t __p)  {

>    uint64x2_t

> -  result = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr,

> __offset, __p);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrdq_gather_base_nowb_z_uv2di (*__addr,

> + __offset, __p);  *__addr = __builtin_mve_vldrdq_gather_base_wb_z_uv2di

> + (*__addr, __offset, __p);

>    return result;

>  }

> 

> @@ -13943,8 +13943,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrwq_gather_base_wb_s32 (uint32x4_t * __addr, const int

> __offset)  {

>    int32x4_t

> -  result = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrwq_gather_base_nowb_sv4si (*__addr,

> + __offset);  *__addr = __builtin_mve_vldrwq_gather_base_wb_sv4si

> + (*__addr, __offset);

>    return result;

>  }

> 

> @@ -13953,8 +13953,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrwq_gather_base_wb_u32 (uint32x4_t * __addr, const int

> __offset)  {

>    uint32x4_t

> -  result = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrwq_gather_base_nowb_uv4si (*__addr,

> + __offset);  *__addr = __builtin_mve_vldrwq_gather_base_wb_uv4si

> + (*__addr, __offset);

>    return result;

>  }

> 

> @@ -13963,8 +13963,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrwq_gather_base_wb_z_s32 (uint32x4_t * __addr, const int

> __offset, mve_pred16_t __p)  {

>    int32x4_t

> -  result = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr,

> __offset, __p);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrwq_gather_base_nowb_z_sv4si (*__addr,

> + __offset, __p);  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_sv4si

> + (*__addr, __offset, __p);

>    return result;

>  }

> 

> @@ -13973,8 +13973,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrwq_gather_base_wb_z_u32 (uint32x4_t * __addr, const int

> __offset, mve_pred16_t __p)  {

>    uint32x4_t

> -  result = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr,

> __offset, __p);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrwq_gather_base_nowb_z_uv4si (*__addr,

> + __offset, __p);  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_uv4si

> + (*__addr, __offset, __p);

>    return result;

>  }

> 

> @@ -19372,8 +19372,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset)

> {

>    float32x4_t

> -  result = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrwq_gather_base_nowb_fv4sf (*__addr,

> + __offset);  *__addr = __builtin_mve_vldrwq_gather_base_wb_fv4sf

> + (*__addr, __offset);

>    return result;

>  }

> 

> @@ -19382,8 +19382,8 @@ __attribute__ ((__always_inline__,

> __gnu_inline__, __artificial__))

>  __arm_vldrwq_gather_base_wb_z_f32 (uint32x4_t * __addr, const int

> __offset, mve_pred16_t __p)  {

>    float32x4_t

> -  result = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr,

> __offset, __p);

> -  __addr += __offset;

> +  result = __builtin_mve_vldrwq_gather_base_nowb_z_fv4sf (*__addr,

> + __offset, __p);  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf

> + (*__addr, __offset, __p);

>    return result;

>  }

> 

> diff --git a/gcc/config/arm/arm_mve_builtins.def

> b/gcc/config/arm/arm_mve_builtins.def

> index

> 2fb975944b9fdac9de4b5a1bec3962be410637f1..753e40a951d071c1ab77476

> a1cc4779e91689178 100644

> --- a/gcc/config/arm/arm_mve_builtins.def

> +++ b/gcc/config/arm/arm_mve_builtins.def

> @@ -847,16 +847,26 @@ VAR1 (STRSBWBS, vstrdq_scatter_base_wb_s, v2di)

>  VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_s, v4si)

>  VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_f, v4sf)

>  VAR1 (STRSBWBS_P, vstrdq_scatter_base_wb_p_s, v2di)

> -VAR1 (LDRGBWBU_Z, vldrwq_gather_base_wb_z_u, v4si)

> -VAR1 (LDRGBWBU_Z, vldrdq_gather_base_wb_z_u, v2di)

> -VAR1 (LDRGBWBU, vldrwq_gather_base_wb_u, v4si)

> -VAR1 (LDRGBWBU, vldrdq_gather_base_wb_u, v2di)

> -VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_s, v4si)

> -VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_f, v4sf)

> -VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di)

> -VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si)

> -VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf)

> -VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di)

> +VAR1 (LDRGBWBU_Z, vldrwq_gather_base_nowb_z_u, v4si)

> +VAR1 (LDRGBWBU_Z, vldrdq_gather_base_nowb_z_u, v2di)

> +VAR1 (LDRGBWBU, vldrwq_gather_base_nowb_u, v4si)

> +VAR1 (LDRGBWBU, vldrdq_gather_base_nowb_u, v2di)

> +VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_s, v4si)

> +VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_f, v4sf)

> +VAR1 (LDRGBWBS_Z, vldrdq_gather_base_nowb_z_s, v2di)

> +VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_s, v4si)

> +VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_f, v4sf)

> +VAR1 (LDRGBWBS, vldrdq_gather_base_nowb_s, v2di)

> +VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_s, v2di)

> +VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_u, v2di)

> +VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_s, v2di)

> +VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_u, v2di)

> +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_s, v4si)

> +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_f, v4sf)

> +VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_u, v4si)

> +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_s, v4si)

> +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_f, v4sf)

> +VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_u, v4si)

>  VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si)

>  VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si)

>  VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si) diff --git

> a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md index

> df602b07840bb4ccb9aa2a9b10992ba7078452ba..d1028f4542b4972b4080e46

> 544c86d625d77383a 100644

> --- a/gcc/config/arm/mve.md

> +++ b/gcc/config/arm/mve.md

> @@ -10420,6 +10420,20 @@

>     (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]

>    "TARGET_HAVE_MVE"

>  {

> +  rtx ignore_result = gen_reg_rtx (V4SImode);

> +  emit_insn (

> +  gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (ignore_result,

> operands[0],

> +						 operands[1], operands[2]));

> +  DONE;

> +})

> +

> +(define_expand "mve_vldrwq_gather_base_nowb_<supf>v4si"

> +  [(match_operand:V4SI 0 "s_register_operand")

> +   (match_operand:V4SI 1 "s_register_operand")

> +   (match_operand:SI 2 "mve_vldrd_immediate")

> +   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]

> +  "TARGET_HAVE_MVE"

> +{

>    rtx ignore_wb = gen_reg_rtx (V4SImode);

>    emit_insn (

>    gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (operands[0],

> ignore_wb, @@ -10459,6 +10473,21 @@

>     (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]

>    "TARGET_HAVE_MVE"

>  {

> +  rtx ignore_result = gen_reg_rtx (V4SImode);

> +  emit_insn (

> +  gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (ignore_result,

> operands[0],

> +						   operands[1], operands[2],

> +						   operands[3]));

> +  DONE;

> +})

> +(define_expand "mve_vldrwq_gather_base_nowb_z_<supf>v4si"

> +  [(match_operand:V4SI 0 "s_register_operand")

> +   (match_operand:V4SI 1 "s_register_operand")

> +   (match_operand:SI 2 "mve_vldrd_immediate")

> +   (match_operand:HI 3 "vpr_register_operand")

> +   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]

> +  "TARGET_HAVE_MVE"

> +{

>    rtx ignore_wb = gen_reg_rtx (V4SImode);

>    emit_insn (

>    gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (operands[0],

> ignore_wb, @@ -10487,12 +10516,26 @@

>     ops[0] = operands[0];

>     ops[1] = operands[2];

>     ops[2] = operands[3];

> -   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);

> +   output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops);

>     return "";

>  }

>    [(set_attr "length" "8")])

> 

>  (define_expand "mve_vldrwq_gather_base_wb_fv4sf"

> +  [(match_operand:V4SI 0 "s_register_operand")

> +   (match_operand:V4SI 1 "s_register_operand")

> +   (match_operand:SI 2 "mve_vldrd_immediate")

> +   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]

> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"

> +{

> +  rtx ignore_result = gen_reg_rtx (V4SFmode);

> +  emit_insn (

> +  gen_mve_vldrwq_gather_base_wb_fv4sf_insn (ignore_result, operands[0],

> +					    operands[1], operands[2]));

> +  DONE;

> +})

> +

> +(define_expand "mve_vldrwq_gather_base_nowb_fv4sf"

>    [(match_operand:V4SF 0 "s_register_operand")

>     (match_operand:V4SI 1 "s_register_operand")

>     (match_operand:SI 2 "mve_vldrd_immediate") @@ -10531,6 +10574,22

> @@

>    [(set_attr "length" "4")])

> 

>  (define_expand "mve_vldrwq_gather_base_wb_z_fv4sf"

> +  [(match_operand:V4SI 0 "s_register_operand")

> +   (match_operand:V4SI 1 "s_register_operand")

> +   (match_operand:SI 2 "mve_vldrd_immediate")

> +   (match_operand:HI 3 "vpr_register_operand")

> +   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]

> +  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"

> +{

> +  rtx ignore_result = gen_reg_rtx (V4SFmode);

> +  emit_insn (

> +  gen_mve_vldrwq_gather_base_wb_z_fv4sf_insn (ignore_result,

> operands[0],

> +					      operands[1], operands[2],

> +					      operands[3]));

> +  DONE;

> +})

> +

> +(define_expand "mve_vldrwq_gather_base_nowb_z_fv4sf"

>    [(match_operand:V4SF 0 "s_register_operand")

>     (match_operand:V4SI 1 "s_register_operand")

>     (match_operand:SI 2 "mve_vldrd_immediate") @@ -10566,7 +10625,7

> @@

>     ops[0] = operands[0];

>     ops[1] = operands[2];

>     ops[2] = operands[3];

> -   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);

> +   output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops);

>     return "";

>  }

>    [(set_attr "length" "8")])

> @@ -10578,6 +10637,20 @@

>     (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]

>    "TARGET_HAVE_MVE"

>  {

> +  rtx ignore_result = gen_reg_rtx (V2DImode);

> +  emit_insn (

> +  gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (ignore_result,

> operands[0],

> +						 operands[1], operands[2]));

> +  DONE;

> +})

> +

> +(define_expand "mve_vldrdq_gather_base_nowb_<supf>v2di"

> +  [(match_operand:V2DI 0 "s_register_operand")

> +   (match_operand:V2DI 1 "s_register_operand")

> +   (match_operand:SI 2 "mve_vldrd_immediate")

> +   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]

> +  "TARGET_HAVE_MVE"

> +{

>    rtx ignore_wb = gen_reg_rtx (V2DImode);

>    emit_insn (

>    gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (operands[0],

> ignore_wb, @@ -10585,6 +10658,7 @@

>    DONE;

>  })

> 

> +

>  ;;

>  ;; [vldrdq_gather_base_wb_s vldrdq_gather_base_wb_u]  ;; @@ -10617,6

> +10691,22 @@

>     (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]

>    "TARGET_HAVE_MVE"

>  {

> +  rtx ignore_result = gen_reg_rtx (V2DImode);

> +  emit_insn (

> +  gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (ignore_result,

> operands[0],

> +						   operands[1], operands[2],

> +						   operands[3]));

> +  DONE;

> +})

> +

> +(define_expand "mve_vldrdq_gather_base_nowb_z_<supf>v2di"

> +  [(match_operand:V2DI 0 "s_register_operand")

> +   (match_operand:V2DI 1 "s_register_operand")

> +   (match_operand:SI 2 "mve_vldrd_immediate")

> +   (match_operand:HI 3 "vpr_register_operand")

> +   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]

> +  "TARGET_HAVE_MVE"

> +{

>    rtx ignore_wb = gen_reg_rtx (V2DImode);

>    emit_insn (

>    gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (operands[0],

> ignore_wb, @@ -10660,7 +10750,7 @@

>     ops[0] = operands[0];

>     ops[1] = operands[2];

>     ops[2] = operands[3];

> -   output_asm_insn ("vpst\;\tvldrdt.u64\t%q0, [%q1, %2]!",ops);

> +   output_asm_insn ("vpst\;vldrdt.u64\t%q0, [%q1, %2]!",ops);

>     return "";

>  }

>    [(set_attr "length" "8")])

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c

> index

> a5c5a61345cb0a46abc7796ceff195698cabe804..0d1ee769ec64b55c7559ce9d

> c14f8a6ae2e43e34 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_

> +++ s64.c

> @@ -10,4 +10,6 @@ foo (uint64x2_t * addr)

>    return vldrdq_gather_base_wb_s64 (addr, 8);  }

> 

> -/* { dg-final { scan-assembler "vldrd.64"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.

> c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.

> c

> index

> 442bca92a43c05124717bf6ea0c44672941091f0..cb2a41bdcd32b553a93d3bcc

> 4787d506f1b54f74 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.

> c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_

> +++ u64.c

> @@ -10,4 +10,6 @@ foo (uint64x2_t * addr)

>    return vldrdq_gather_base_wb_u64 (addr, 8);  }

> 

> -/* { dg-final { scan-assembler "vldrd.64"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6

> 4.c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6

> 4.c

> index

> 1863d0835e12328b7b7bb824f59e3d441042f56d..243fbeacc3429025202da2ff

> 157ade38a472e123 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s6

> 4.c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_

> +++ z_s64.c

> @@ -8,4 +8,8 @@ int64x2_t foo (uint64x2_t * addr, mve_pred16_t p)

>      return vldrdq_gather_base_wb_z_s64 (addr, 1016, p);  }

> 

> -/* { dg-final { scan-assembler "vldrdt.u64"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */

> +/* { dg-final { scan-assembler "vpst" } } */

> +/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6

> 4.c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6

> 4.c

> index

> 7ba272a112607b0e57a3d4659e5b4033044af83c..10ba42405fe8fde9d4f8993

> b20e41a59c7bb2e77 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u6

> 4.c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_

> +++ z_u64.c

> @@ -8,4 +8,8 @@ uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p)

>      return vldrdq_gather_base_wb_z_u64 (addr, 8, p);  }

> 

> -/* { dg-final { scan-assembler "vldrdt.u64"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */

> +/* { dg-final { scan-assembler "vpst" } } */

> +/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.

> c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.

> c

> index

> 6b496873f173e30414ffcddf50513758bc8ca770..db8108e37325c4e1fafd2293d

> 48eba0c33309073 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.

> c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_

> +++ f32.c

> @@ -10,4 +10,6 @@ foo (uint32x4_t * addr)

>    return vldrwq_gather_base_wb_f32 (addr, 8);  }

> 

> -/* { dg-final { scan-assembler "vldrw.u32"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.

> c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.

> c

> index

> 9bbbd0d701546b5ec224129aef49e632addea550..3da64e218e2c0789e996be

> 551650033567eba4e5 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.

> c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_

> +++ s32.c

> @@ -10,4 +10,6 @@ foo (uint32x4_t * addr)

>    return vldrwq_gather_base_wb_s32 (addr, 8);  }

> 

> -/* { dg-final { scan-assembler "vldrw.u32"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.

> c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.

> c

> index

> 774230b290367a7d28f0c8579be26fc9c75db1cb..2597ee11608bfe21d697f225

> 0bee7e69c0cc7aec 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.

> c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_

> +++ u32.c

> @@ -10,4 +10,6 @@ foo (uint32x4_t * addr)

>    return vldrwq_gather_base_wb_u32 (addr, 8);  }

> 

> -/* { dg-final { scan-assembler "vldrw.u32"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3

> 2.c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3

> 2.c

> index

> 6400f014a88ccf34fef15effff65f9b1267dbd5f..f1ba63855be254d96806c16317

> 7e32856294c106 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f3

> 2.c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_

> +++ z_f32.c

> @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p)

>    return vldrwq_gather_base_wb_z_f32 (addr, 8, p);  }

> 

> -/* { dg-final { scan-assembler "vldrwt.u32"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vmsr\tP0, r\[0-9\]+.*" } } */

> +/* { dg-final { scan-assembler "vpst" } } */

> +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3

> 2.c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3

> 2.c

> index

> de7006c51f17665b80b83fd5ea034477b7a7e778..56da5a46c64d2946ceade86

> 89105048e19efdc6a 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s3

> 2.c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_

> +++ z_s32.c

> @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p)

>    return vldrwq_gather_base_wb_z_s32 (addr, 8, p);  }

> 

> -/* { dg-final { scan-assembler "vldrwt.u32"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */

> +/* { dg-final { scan-assembler "vpst" } } */

> +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> diff --git

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3

> 2.c

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u

> 32.c

> index

> 6c9608f07ba966876804f56403a4352a51a0e0c4..63165d97c1a7b4120be0363

> 48a09b73afddd36d1 100644

> ---

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u3

> 2.c

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_

> +++ z_u32.c

> @@ -10,4 +10,8 @@ foo (uint32x4_t * addr, mve_pred16_t p)

>    return vldrwq_gather_base_wb_z_u32 (addr, 8, p);  }

> 

> -/* { dg-final { scan-assembler "vldrwt.u32"  }  } */

> +/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

> +/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */

> +/* { dg-final { scan-assembler "vpst" } } */

> +/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+,

> +#\[0-9\]+\\\]!" } } */

> +/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" }

> +} */

Patch

diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 56f0db21ea95dcd738877daba27f1cb60f0d5a32..832b9107424fd9a4a0ee272b773b3d0929172370 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -719,6 +719,17 @@  arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   (arm_quinop_unone_unone_unone_unone_imm_unone_qualifiers)
 
 static enum arm_type_qualifiers
+arm_ldrgbwbxu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate};
+#define LDRGBWBXU_QUALIFIERS (arm_ldrgbwbxu_qualifiers)
+
+static enum arm_type_qualifiers
+arm_ldrgbwbxu_z_qualifiers[SIMD_MAX_BUILTIN_ARGS]
+  = { qualifier_unsigned, qualifier_unsigned, qualifier_immediate,
+      qualifier_unsigned};
+#define LDRGBWBXU_Z_QUALIFIERS (arm_ldrgbwbxu_z_qualifiers)
+
+static enum arm_type_qualifiers
 arm_ldrgbwbs_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_unsigned, qualifier_immediate};
 #define LDRGBWBS_QUALIFIERS (arm_ldrgbwbs_qualifiers)
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
index f1dcdc2153217e796c58526ba0e5be11be642234..47a6268e0800958f49d46238fe34ec749d243929 100644
--- a/gcc/config/arm/arm_mve.h
+++ b/gcc/config/arm/arm_mve.h
@@ -13903,8 +13903,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_s64 (uint64x2_t * __addr, const int __offset)
 {
   int64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_sv2di (*__addr, __offset);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_sv2di (*__addr, __offset);
   return result;
 }
 
@@ -13913,8 +13913,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_u64 (uint64x2_t * __addr, const int __offset)
 {
   uint64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_uv2di (*__addr, __offset);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_uv2di (*__addr, __offset);
   return result;
 }
 
@@ -13923,8 +13923,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_z_s64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p)
 {
   int64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_z_sv2di (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_z_sv2di (*__addr, __offset, __p);
   return result;
 }
 
@@ -13933,8 +13933,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrdq_gather_base_wb_z_u64 (uint64x2_t * __addr, const int __offset, mve_pred16_t __p)
 {
   uint64x2_t
-  result = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrdq_gather_base_nowb_z_uv2di (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrdq_gather_base_wb_z_uv2di (*__addr, __offset, __p);
   return result;
 }
 
@@ -13943,8 +13943,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_s32 (uint32x4_t * __addr, const int __offset)
 {
   int32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_sv4si (*__addr, __offset);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_sv4si (*__addr, __offset);
   return result;
 }
 
@@ -13953,8 +13953,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_u32 (uint32x4_t * __addr, const int __offset)
 {
   uint32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_uv4si (*__addr, __offset);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_uv4si (*__addr, __offset);
   return result;
 }
 
@@ -13963,8 +13963,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_z_s32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
 {
   int32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_z_sv4si (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_sv4si (*__addr, __offset, __p);
   return result;
 }
 
@@ -13973,8 +13973,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_z_u32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
 {
   uint32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_z_uv4si (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_uv4si (*__addr, __offset, __p);
   return result;
 }
 
@@ -19372,8 +19372,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_f32 (uint32x4_t * __addr, const int __offset)
 {
   float32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_fv4sf (*__addr, __offset);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_fv4sf (*__addr, __offset);
   return result;
 }
 
@@ -19382,8 +19382,8 @@  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
 __arm_vldrwq_gather_base_wb_z_f32 (uint32x4_t * __addr, const int __offset, mve_pred16_t __p)
 {
   float32x4_t
-  result = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p);
-  __addr += __offset;
+  result = __builtin_mve_vldrwq_gather_base_nowb_z_fv4sf (*__addr, __offset, __p);
+  *__addr = __builtin_mve_vldrwq_gather_base_wb_z_fv4sf (*__addr, __offset, __p);
   return result;
 }
 
diff --git a/gcc/config/arm/arm_mve_builtins.def b/gcc/config/arm/arm_mve_builtins.def
index 2fb975944b9fdac9de4b5a1bec3962be410637f1..753e40a951d071c1ab77476a1cc4779e91689178 100644
--- a/gcc/config/arm/arm_mve_builtins.def
+++ b/gcc/config/arm/arm_mve_builtins.def
@@ -847,16 +847,26 @@  VAR1 (STRSBWBS, vstrdq_scatter_base_wb_s, v2di)
 VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_s, v4si)
 VAR1 (STRSBWBS_P, vstrwq_scatter_base_wb_p_f, v4sf)
 VAR1 (STRSBWBS_P, vstrdq_scatter_base_wb_p_s, v2di)
-VAR1 (LDRGBWBU_Z, vldrwq_gather_base_wb_z_u, v4si)
-VAR1 (LDRGBWBU_Z, vldrdq_gather_base_wb_z_u, v2di)
-VAR1 (LDRGBWBU, vldrwq_gather_base_wb_u, v4si)
-VAR1 (LDRGBWBU, vldrdq_gather_base_wb_u, v2di)
-VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_s, v4si)
-VAR1 (LDRGBWBS_Z, vldrwq_gather_base_wb_z_f, v4sf)
-VAR1 (LDRGBWBS_Z, vldrdq_gather_base_wb_z_s, v2di)
-VAR1 (LDRGBWBS, vldrwq_gather_base_wb_s, v4si)
-VAR1 (LDRGBWBS, vldrwq_gather_base_wb_f, v4sf)
-VAR1 (LDRGBWBS, vldrdq_gather_base_wb_s, v2di)
+VAR1 (LDRGBWBU_Z, vldrwq_gather_base_nowb_z_u, v4si)
+VAR1 (LDRGBWBU_Z, vldrdq_gather_base_nowb_z_u, v2di)
+VAR1 (LDRGBWBU, vldrwq_gather_base_nowb_u, v4si)
+VAR1 (LDRGBWBU, vldrdq_gather_base_nowb_u, v2di)
+VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_s, v4si)
+VAR1 (LDRGBWBS_Z, vldrwq_gather_base_nowb_z_f, v4sf)
+VAR1 (LDRGBWBS_Z, vldrdq_gather_base_nowb_z_s, v2di)
+VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_s, v4si)
+VAR1 (LDRGBWBS, vldrwq_gather_base_nowb_f, v4sf)
+VAR1 (LDRGBWBS, vldrdq_gather_base_nowb_s, v2di)
+VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_s, v2di)
+VAR1 (LDRGBWBXU_Z, vldrdq_gather_base_wb_z_u, v2di)
+VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_s, v2di)
+VAR1 (LDRGBWBXU, vldrdq_gather_base_wb_u, v2di)
+VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_s, v4si)
+VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_f, v4sf)
+VAR1 (LDRGBWBXU_Z, vldrwq_gather_base_wb_z_u, v4si)
+VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_s, v4si)
+VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_f, v4sf)
+VAR1 (LDRGBWBXU, vldrwq_gather_base_wb_u, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vadciq_s, v4si)
 VAR1 (BINOP_UNONE_UNONE_UNONE, vadciq_u, v4si)
 VAR1 (BINOP_NONE_NONE_NONE, vadcq_s, v4si)
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
index df602b07840bb4ccb9aa2a9b10992ba7078452ba..d1028f4542b4972b4080e46544c86d625d77383a 100644
--- a/gcc/config/arm/mve.md
+++ b/gcc/config/arm/mve.md
@@ -10420,6 +10420,20 @@ 
    (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (ignore_result, operands[0],
+						 operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vldrwq_gather_base_nowb_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V4SImode);
   emit_insn (
   gen_mve_vldrwq_gather_base_wb_<supf>v4si_insn (operands[0], ignore_wb,
@@ -10459,6 +10473,21 @@ 
    (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V4SImode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (ignore_result, operands[0],
+						   operands[1], operands[2],
+						   operands[3]));
+  DONE;
+})
+(define_expand "mve_vldrwq_gather_base_nowb_z_<supf>v4si"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VLDRWGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V4SImode);
   emit_insn (
   gen_mve_vldrwq_gather_base_wb_z_<supf>v4si_insn (operands[0], ignore_wb,
@@ -10487,12 +10516,26 @@ 
    ops[0] = operands[0];
    ops[1] = operands[2];
    ops[2] = operands[3];
-   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);
+   output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops);
    return "";
 }
   [(set_attr "length" "8")])
 
 (define_expand "mve_vldrwq_gather_base_wb_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_result = gen_reg_rtx (V4SFmode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_fv4sf_insn (ignore_result, operands[0],
+					    operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vldrwq_gather_base_nowb_fv4sf"
   [(match_operand:V4SF 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
@@ -10531,6 +10574,22 @@ 
   [(set_attr "length" "4")])
 
 (define_expand "mve_vldrwq_gather_base_wb_z_fv4sf"
+  [(match_operand:V4SI 0 "s_register_operand")
+   (match_operand:V4SI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V4SI [(const_int 0)] VLDRWQGBWB_F)]
+  "TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT"
+{
+  rtx ignore_result = gen_reg_rtx (V4SFmode);
+  emit_insn (
+  gen_mve_vldrwq_gather_base_wb_z_fv4sf_insn (ignore_result, operands[0],
+					      operands[1], operands[2],
+					      operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vldrwq_gather_base_nowb_z_fv4sf"
   [(match_operand:V4SF 0 "s_register_operand")
    (match_operand:V4SI 1 "s_register_operand")
    (match_operand:SI 2 "mve_vldrd_immediate")
@@ -10566,7 +10625,7 @@ 
    ops[0] = operands[0];
    ops[1] = operands[2];
    ops[2] = operands[3];
-   output_asm_insn ("vpst\;\tvldrwt.u32\t%q0, [%q1, %2]!",ops);
+   output_asm_insn ("vpst\;vldrwt.u32\t%q0, [%q1, %2]!",ops);
    return "";
 }
   [(set_attr "length" "8")])
@@ -10578,6 +10637,20 @@ 
    (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (ignore_result, operands[0],
+						 operands[1], operands[2]));
+  DONE;
+})
+
+(define_expand "mve_vldrdq_gather_base_nowb_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand")
+   (match_operand:V2DI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V2DImode);
   emit_insn (
   gen_mve_vldrdq_gather_base_wb_<supf>v2di_insn (operands[0], ignore_wb,
@@ -10585,6 +10658,7 @@ 
   DONE;
 })
 
+
 ;;
 ;; [vldrdq_gather_base_wb_s vldrdq_gather_base_wb_u]
 ;;
@@ -10617,6 +10691,22 @@ 
    (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
   "TARGET_HAVE_MVE"
 {
+  rtx ignore_result = gen_reg_rtx (V2DImode);
+  emit_insn (
+  gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (ignore_result, operands[0],
+						   operands[1], operands[2],
+						   operands[3]));
+  DONE;
+})
+
+(define_expand "mve_vldrdq_gather_base_nowb_z_<supf>v2di"
+  [(match_operand:V2DI 0 "s_register_operand")
+   (match_operand:V2DI 1 "s_register_operand")
+   (match_operand:SI 2 "mve_vldrd_immediate")
+   (match_operand:HI 3 "vpr_register_operand")
+   (unspec:V2DI [(const_int 0)] VLDRDGBWBQ)]
+  "TARGET_HAVE_MVE"
+{
   rtx ignore_wb = gen_reg_rtx (V2DImode);
   emit_insn (
   gen_mve_vldrdq_gather_base_wb_z_<supf>v2di_insn (operands[0], ignore_wb,
@@ -10660,7 +10750,7 @@ 
    ops[0] = operands[0];
    ops[1] = operands[2];
    ops[2] = operands[3];
-   output_asm_insn ("vpst\;\tvldrdt.u64\t%q0, [%q1, %2]!",ops);
+   output_asm_insn ("vpst\;vldrdt.u64\t%q0, [%q1, %2]!",ops);
    return "";
 }
   [(set_attr "length" "8")])
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
index a5c5a61345cb0a46abc7796ceff195698cabe804..0d1ee769ec64b55c7559ce9dc14f8a6ae2e43e34 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_s64.c
@@ -10,4 +10,6 @@  foo (uint64x2_t * addr)
   return vldrdq_gather_base_wb_s64 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrd.64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
index 442bca92a43c05124717bf6ea0c44672941091f0..cb2a41bdcd32b553a93d3bcc4787d506f1b54f74 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_u64.c
@@ -10,4 +10,6 @@  foo (uint64x2_t * addr)
   return vldrdq_gather_base_wb_u64 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrd.64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrd.64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
index 1863d0835e12328b7b7bb824f59e3d441042f56d..243fbeacc3429025202da2ff157ade38a472e123 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_s64.c
@@ -8,4 +8,8 @@  int64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
     return vldrdq_gather_base_wb_z_s64 (addr, 1016, p);
 }
 
-/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*$" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
index 7ba272a112607b0e57a3d4659e5b4033044af83c..10ba42405fe8fde9d4f8993b20e41a59c7bb2e77 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrdq_gather_base_wb_z_u64.c
@@ -8,4 +8,8 @@  uint64x2_t foo (uint64x2_t * addr, mve_pred16_t p)
     return vldrdq_gather_base_wb_z_u64 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrdt.u64"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrdt.u64\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
index 6b496873f173e30414ffcddf50513758bc8ca770..db8108e37325c4e1fafd2293d48eba0c33309073 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_f32.c
@@ -10,4 +10,6 @@  foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_f32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
index 9bbbd0d701546b5ec224129aef49e632addea550..3da64e218e2c0789e996be551650033567eba4e5 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_s32.c
@@ -10,4 +10,6 @@  foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_s32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
index 774230b290367a7d28f0c8579be26fc9c75db1cb..2597ee11608bfe21d697f2250bee7e69c0cc7aec 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_u32.c
@@ -10,4 +10,6 @@  foo (uint32x4_t * addr)
   return vldrwq_gather_base_wb_u32 (addr, 8);
 }
 
-/* { dg-final { scan-assembler "vldrw.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vldrw.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
index 6400f014a88ccf34fef15effff65f9b1267dbd5f..f1ba63855be254d96806c163177e32856294c106 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_f32.c
@@ -10,4 +10,8 @@  foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_f32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\tP0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
index de7006c51f17665b80b83fd5ea034477b7a7e778..56da5a46c64d2946ceade8689105048e19efdc6a 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_s32.c
@@ -10,4 +10,8 @@  foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_s32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
index 6c9608f07ba966876804f56403a4352a51a0e0c4..63165d97c1a7b4120be036348a09b73afddd36d1 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vldrwq_gather_base_wb_z_u32.c
@@ -10,4 +10,8 @@  foo (uint32x4_t * addr, mve_pred16_t p)
   return vldrwq_gather_base_wb_z_u32 (addr, 8, p);
 }
 
-/* { dg-final { scan-assembler "vldrwt.u32"  }  } */
+/* { dg-final { scan-assembler "vldrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */
+/* { dg-final { scan-assembler "vmsr\t P0, r\[0-9\]+.*" } } */
+/* { dg-final { scan-assembler "vpst" } } */
+/* { dg-final { scan-assembler "vldrwt.u32\tq\[0-9\]+, \\\[q\[0-9\]+, #\[0-9\]+\\\]!" } } */
+/* { dg-final { scan-assembler "vstrb.8 q\[0-9\]+, \\\[r\[0-9\]+\\\]" } } */