[ARM,1/x] : MVE ACLE intrinsics framework patch.

Message ID DBBPR08MB4775F9F4A174B7BFD2021C9C9B710@DBBPR08MB4775.eurprd08.prod.outlook.com
State New
Headers show
Series
  • [ARM,1/x] : MVE ACLE intrinsics framework patch.
Related show

Commit Message

Srinath Parvathaneni Nov. 14, 2019, 7:12 p.m.
Hello,

This patch creates the required framework for MVE ACLE intrinsics.

The following changes are done in this patch to support MVE ACLE intrinsics.

Header file arm_mve.h is added to source code, which contains the definitions of MVE ACLE intrinsics
and different data types used in MVE. Machine description file mve.md is also added which contains the
RTL patterns defined for MVE.

A new reigster "p0" is added which is used in by MVE predicated patterns. A new register class "VPR_REG"
is added and its contents are defined in REG_CLASS_CONTENTS.

The vec-common.md file is modified to support the standard move patterns. The prefix of neon functions
which are also used by MVE is changed from "neon_" to "simd_".
eg: neon_immediate_valid_for_move changed to simd_immediate_valid_for_move.

In the patch standard patterns mve_move, mve_store and move_load for MVE are added and neon.md and vfp.md
files are modified to support this common patterns.

Please refer to Arm reference manual [1] for more details.

[1] https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914

Regression tested on arm-none-eabi and found no regressions.

Ok for trunk?

Thanks,
Srinath

gcc/ChangeLog:

2019-11-11  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* config.gcc (arm_mve.h): Add header file.
	* config/arm/aout.h (p0): Add new register name.
	* config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define.
	(ARM_BUILTIN_NEON_LANE_CHECK): Remove.
	(arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check.
	(arm_init_neon_builtins): Move a check to arm_init_builtins function.
	(arm_init_builtins): Move a check from arm_init_neon_builtins function.
	(mve_dereference_pointer): Add new function.
	(arm_expand_builtin_args): Add TARGET_HAVE_MVE check.
	(arm_expand_neon_builtin): Move a check to arm_expand_builtin function.
	(arm_expand_builtin): Move a check from arm_expand_neon_builtin function.
	* config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE.
	* config/arm/arm-modes.def (INT_MODE): Add three new integer modes.
	* config/arm/arm-protos.h (neon_immediate_valid_for_move): Rename function.
	(simd_immediate_valid_for_move): Rename neon_immediate_valid_for_move function.
	* config/arm/arm.c (arm_options_perform_arch_sanity_checks):Enable mve isa bit.
	(use_return_insn): Add TARGET_HAVE_MVE check.
	(aapcs_vfp_allocate): Add TARGET_HAVE_MVE check.
	(aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check.
	(thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check.
	(arm_rtx_costs_internal): Add TARGET_HAVE_MVE check.
	(neon_valid_immediate): Rename to simd_valid_immediate.
	(simd_valid_immediate): Rename from neon_valid_immediate.
	(neon_immediate_valid_for_move): Rename to simd_immediate_valid_for_move.
	(simd_immediate_valid_for_move): Rename from neon_immediate_valid_for_move.
	(neon_immediate_valid_for_logic): Modify call to neon_valid_immediate function.
	(neon_make_constant): Modify call to neon_valid_immediate function.
	(neon_vector_mem_operand): Add TARGET_HAVE_MVE check.
	(output_move_neon): Add TARGET_HAVE_MVE check.
	(arm_compute_frame_layout): Add TARGET_HAVE_MVE check.
	(arm_save_coproc_regs): Add TARGET_HAVE_MVE check.
	(arm_print_operand): Add case 'E' to print memory operands.
	(arm_print_operand_address): Add TARGET_HAVE_MVE check.
	(arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check.
	(arm_modes_tieable_p): Add TARGET_HAVE_MVE check.
	(arm_regno_class): Add VPR_REGNUM check.
	(arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check.
	(arm_expand_epilogue): Add TARGET_HAVE_MVE check.
	(arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for MVE vector modes.
	(arm_array_mode_supported_p): Add TARGET_HAVE_MVE check.
	(arm_conditional_register_usage): For TARGET_HAVE_MVE enable VPR register.
	* config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR register.
	(FIRST_PSEUDO_REGISTER): Modify.
	(VALID_MVE_MODE): Define.
	(VALID_MVE_SI_MODE): Define.
	(VALID_MVE_SF_MODE): Define.
	(VALID_MVE_STRUCT_MODE): Define.
	(REG_ALLOC_ORDER): Add VPR_REGNUM entry.
	(enum reg_class): Add VPR_REG entry.
	(REG_CLASS_NAMES): Add VPR_REG entry.
	* config/arm/arm.md (VPR_REGNUM): Define.
	(arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE.
	(vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check to allow writeback.
	(include "mve.md"): Include mve.md file.
	* config/arm/arm_mve.h: New file.
	* config/arm/constraints.md (Up): Define.
	* config/arm/iterators.md (VNIM1): Define.
	(VNINOTM1): Define.
	(VSTRUCT): Modify.
	* config/arm/mve.md: New file.
	* config/arm/neon.md:
	(mov<mode>): Add TARGET_HAVE_MVE check.
	(movv4hf): Define.
	(neon_mov<mode>): Add TARGET_HAVE_MVE check.
	(define_split): Add TARGET_HAVE_MVE check.
	(vec_init<mode><V_elem_l>): Add TARGET_HAVE_MVE check.
	* config/arm/predicates.md (vpr_register_operand): Define.
	* config/arm/t-arm: Add mve.md file.
	* config/arm/types.md: Add MVE instructions mve_move, mve_load, mve_store.
	* config/arm/vec-common.md (mov<mode>): Add TARGET_HAVE_MVE check.
	(mov<mode>): Modify iterator.
	(movv8hf): Define

gcc/testsuite/ChangeLog:

2019-11-11  Andre Vieira  <andre.simoesdiasvieira@arm.com>
	    Mihail Ionescu  <mihail.ionescu@arm.com>
	    Srinath Parvathaneni  <srinath.parvathaneni@arm.com>

	* gcc.target/arm/mve/intrinsics/mve_vector_float.c: New test.
	* gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_int.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_uint.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_uint1.c: Likewise.
	* gcc.target/arm/mve/intrinsics/mve_vector_uint2.c: Likewise.
	* gcc.target/arm/mve/mve.exp: New file.


###############     Attachment also inlined for ease of reply    ###############

Comments

Kyrill Tkachov Dec. 18, 2019, 5:16 p.m. | #1
On 11/14/19 7:12 PM, Srinath Parvathaneni wrote:
> Hello,

>

> This patch creates the required framework for MVE ACLE intrinsics.

>

> The following changes are done in this patch to support MVE ACLE 

> intrinsics.

>

> Header file arm_mve.h is added to source code, which contains the 

> definitions of MVE ACLE intrinsics

> and different data types used in MVE. Machine description file mve.md 

> is also added which contains the

> RTL patterns defined for MVE.

>

> A new reigster "p0" is added which is used in by MVE predicated 

> patterns. A new register class "VPR_REG"

> is added and its contents are defined in REG_CLASS_CONTENTS.

>

> The vec-common.md file is modified to support the standard move 

> patterns. The prefix of neon functions

> which are also used by MVE is changed from "neon_" to "simd_".

> eg: neon_immediate_valid_for_move changed to 

> simd_immediate_valid_for_move.

>

> In the patch standard patterns mve_move, mve_store and move_load for 

> MVE are added and neon.md and vfp.md

> files are modified to support this common patterns.

>

> Please refer to Arm reference manual [1] for more details.

>

> [1] 

> https://static.docs.arm.com/ddi0553/bh/DDI0553B_h_armv8m_arm.pdf?_ga=2.102521798.659307368.1572453718-1501600630.1548848914

>

> Regression tested on arm-none-eabi and found no regressions.

>

> Ok for trunk?


Ok.

Thanks,

Kyrill

>

> Thanks,

> Srinath

>

> gcc/ChangeLog:

>

> 2019-11-11  Andre Vieira <andre.simoesdiasvieira@arm.com>

>             Mihail Ionescu  <mihail.ionescu@arm.com>

>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>

>

>         * config.gcc (arm_mve.h): Add header file.

>         * config/arm/aout.h (p0): Add new register name.

>         * config/arm-builtins.c (ARM_BUILTIN_SIMD_LANE_CHECK): Define.

>         (ARM_BUILTIN_NEON_LANE_CHECK): Remove.

>         (arm_init_simd_builtin_types): Add TARGET_HAVE_MVE check.

>         (arm_init_neon_builtins): Move a check to arm_init_builtins 

> function.

>         (arm_init_builtins): Move a check from arm_init_neon_builtins 

> function.

>         (mve_dereference_pointer): Add new function.

>         (arm_expand_builtin_args): Add TARGET_HAVE_MVE check.

>         (arm_expand_neon_builtin): Move a check to arm_expand_builtin 

> function.

>         (arm_expand_builtin): Move a check from 

> arm_expand_neon_builtin function.

>         * config/arm/arm-c.c (arm_cpu_builtins): Define macros for MVE.

>         * config/arm/arm-modes.def (INT_MODE): Add three new integer 

> modes.

>         * config/arm/arm-protos.h (neon_immediate_valid_for_move): 

> Rename function.

>         (simd_immediate_valid_for_move): Rename 

> neon_immediate_valid_for_move function.

>         * config/arm/arm.c 

> (arm_options_perform_arch_sanity_checks):Enable mve isa bit.

>         (use_return_insn): Add TARGET_HAVE_MVE check.

>         (aapcs_vfp_allocate): Add TARGET_HAVE_MVE check.

>         (aapcs_vfp_allocate_return_reg): Add TARGET_HAVE_MVE check.

>         (thumb2_legitimate_address_p): Add TARGET_HAVE_MVE check.

>         (arm_rtx_costs_internal): Add TARGET_HAVE_MVE check.

>         (neon_valid_immediate): Rename to simd_valid_immediate.

>         (simd_valid_immediate): Rename from neon_valid_immediate.

>         (neon_immediate_valid_for_move): Rename to 

> simd_immediate_valid_for_move.

>         (simd_immediate_valid_for_move): Rename from 

> neon_immediate_valid_for_move.

>         (neon_immediate_valid_for_logic): Modify call to 

> neon_valid_immediate function.

>         (neon_make_constant): Modify call to neon_valid_immediate 

> function.

>         (neon_vector_mem_operand): Add TARGET_HAVE_MVE check.

>         (output_move_neon): Add TARGET_HAVE_MVE check.

>         (arm_compute_frame_layout): Add TARGET_HAVE_MVE check.

>         (arm_save_coproc_regs): Add TARGET_HAVE_MVE check.

>         (arm_print_operand): Add case 'E' to print memory operands.

>         (arm_print_operand_address): Add TARGET_HAVE_MVE check.

>         (arm_hard_regno_mode_ok): Add TARGET_HAVE_MVE check.

>         (arm_modes_tieable_p): Add TARGET_HAVE_MVE check.

>         (arm_regno_class): Add VPR_REGNUM check.

>         (arm_expand_epilogue_apcs_frame): Add TARGET_HAVE_MVE check.

>         (arm_expand_epilogue): Add TARGET_HAVE_MVE check.

>         (arm_vector_mode_supported_p): Add TARGET_HAVE_MVE check for 

> MVE vector modes.

>         (arm_array_mode_supported_p): Add TARGET_HAVE_MVE check.

>         (arm_conditional_register_usage): For TARGET_HAVE_MVE enable 

> VPR register.

>         * config/arm/arm.h (IS_VPR_REGNUM): Macro to check for VPR 

> register.

>         (FIRST_PSEUDO_REGISTER): Modify.

>         (VALID_MVE_MODE): Define.

>         (VALID_MVE_SI_MODE): Define.

>         (VALID_MVE_SF_MODE): Define.

>         (VALID_MVE_STRUCT_MODE): Define.

>         (REG_ALLOC_ORDER): Add VPR_REGNUM entry.

>         (enum reg_class): Add VPR_REG entry.

>         (REG_CLASS_NAMES): Add VPR_REG entry.

>         * config/arm/arm.md (VPR_REGNUM): Define.

>         (arm_movsf_soft_insn): Add TARGET_HAVE_MVE check to not allow MVE.

>         (vfp_pop_multiple_with_writeback): Add TARGET_HAVE_MVE check 

> to allow writeback.

>         (include "mve.md"): Include mve.md file.

>         * config/arm/arm_mve.h: New file.

>         * config/arm/constraints.md (Up): Define.

>         * config/arm/iterators.md (VNIM1): Define.

>         (VNINOTM1): Define.

>         (VSTRUCT): Modify.

>         * config/arm/mve.md: New file.

>         * config/arm/neon.md:

>         (mov<mode>): Add TARGET_HAVE_MVE check.

>         (movv4hf): Define.

>         (neon_mov<mode>): Add TARGET_HAVE_MVE check.

>         (define_split): Add TARGET_HAVE_MVE check.

>         (vec_init<mode><V_elem_l>): Add TARGET_HAVE_MVE check.

>         * config/arm/predicates.md (vpr_register_operand): Define.

>         * config/arm/t-arm: Add mve.md file.

>         * config/arm/types.md: Add MVE instructions mve_move, 

> mve_load, mve_store.

>         * config/arm/vec-common.md (mov<mode>): Add TARGET_HAVE_MVE check.

>         (mov<mode>): Modify iterator.

>         (movv8hf): Define

>

> gcc/testsuite/ChangeLog:

>

> 2019-11-11  Andre Vieira <andre.simoesdiasvieira@arm.com>

>             Mihail Ionescu  <mihail.ionescu@arm.com>

>             Srinath Parvathaneni <srinath.parvathaneni@arm.com>

>

>         * gcc.target/arm/mve/intrinsics/mve_vector_float.c: New test.

>         * gcc.target/arm/mve/intrinsics/mve_vector_float1.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_float2.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_int.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_int1.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_int2.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_uint.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_uint1.c: Likewise.

>         * gcc.target/arm/mve/intrinsics/mve_vector_uint2.c: Likewise.

>         * gcc.target/arm/mve/mve.exp: New file.

>

>

> ###############     Attachment also inlined for ease of reply    

> ###############

>

>

> diff --git a/gcc/config.gcc b/gcc/config.gcc

> index 

> 72f656408f11802c669c3de953bf3020020ca312..c4a7d984936c531d7dfcce347d56b5931913e68b 

> 100644

> --- a/gcc/config.gcc

> +++ b/gcc/config.gcc

> @@ -344,7 +344,7 @@ arc*-*-*)

>  arm*-*-*)

>          cpu_type=arm

>          extra_objs="arm-builtins.o aarch-common.o"

> -       extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h 

> arm_cmse.h"

> +       extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h 

> arm_cmse.h arm_mve.h"

>          target_type_format_char='%'

>          c_target_objs="arm-c.o"

>          cxx_target_objs="arm-c.o"

> diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h

> index 

> 72782758853a869bcb9a9d69f3fa0da979cd711f..28cde153f704748f35c84d072b59e9695a61e661 

> 100644

> --- a/gcc/config/arm/aout.h

> +++ b/gcc/config/arm/aout.h

> @@ -53,7 +53,9 @@

>  /* The assembler's names for the registers.  Note that the ?xx 

> registers are

>     there so that VFPv3/NEON registers D16-D31 have the same spacing 

> as D0-D15

>     (each of which is overlaid on two S registers), although there are no

> -   actual single-precision registers which correspond to D16-D31.  */

> +   actual single-precision registers which correspond to D16-D31.  

> New register

> +   p0 is added which is used for MVE predicated cases.  */

> +

>  #ifndef REGISTER_NAMES

>  #define REGISTER_NAMES                                          \

>  {                                                               \

> @@ -72,7 +74,7 @@

>    "wr8",   "wr9",   "wr10", "wr11",                           \

>    "wr12",  "wr13",  "wr14", "wr15",                           \

>    "wcgr0", "wcgr1", "wcgr2", "wcgr3",                          \

> -  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"               \

> +  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0"         \

>  }

>  #endif

>

> diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c

> index 

> 650b22c7ad916d9abd587981e9ed5809755ee035..d4cb0ea3deb49b10266d1620c85e243ed34aee4d 

> 100644

> --- a/gcc/config/arm/arm-builtins.c

> +++ b/gcc/config/arm/arm-builtins.c

> @@ -667,6 +667,7 @@ enum arm_builtins

>    ARM_BUILTIN_SET_FPSCR,

>

>    ARM_BUILTIN_CMSE_NONSECURE_CALLER,

> +  ARM_BUILTIN_SIMD_LANE_CHECK,

>

>  #undef CRYPTO1

>  #undef CRYPTO2

> @@ -692,7 +693,6 @@ enum arm_builtins

>  #include "arm_vfp_builtins.def"

>

>    ARM_BUILTIN_NEON_BASE,

> -  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,

>

>  #include "arm_neon_builtins.def"

>

> @@ -948,26 +948,35 @@ arm_init_simd_builtin_types (void)

>       an entry in our mangling table, consequently, they get default

>       mangling.  As a further gotcha, poly8_t and poly16_t are signed

>       types, poly64_t and poly128_t are unsigned types.  */

> -  arm_simd_polyQI_type_node

> -    = build_distinct_type_copy (intQI_type_node);

> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,

> - "__builtin_neon_poly8");

> -  arm_simd_polyHI_type_node

> -    = build_distinct_type_copy (intHI_type_node);

> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,

> - "__builtin_neon_poly16");

> -  arm_simd_polyDI_type_node

> -    = build_distinct_type_copy (unsigned_intDI_type_node);

> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,

> - "__builtin_neon_poly64");

> -  arm_simd_polyTI_type_node

> -    = build_distinct_type_copy (unsigned_intTI_type_node);

> -  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,

> - "__builtin_neon_poly128");

> -  /* Prevent front-ends from transforming poly vectors into string

> -     literals.  */

> -  TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;

> -  TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;

> +  if (!TARGET_HAVE_MVE)

> +    {

> +      arm_simd_polyQI_type_node

> +       = build_distinct_type_copy (intQI_type_node);

> +      (*lang_hooks.types.register_builtin_type) 

> (arm_simd_polyQI_type_node,

> + "__builtin_neon_poly8");

> +      arm_simd_polyHI_type_node

> +       = build_distinct_type_copy (intHI_type_node);

> +      (*lang_hooks.types.register_builtin_type) 

> (arm_simd_polyHI_type_node,

> + "__builtin_neon_poly16");

> +      arm_simd_polyDI_type_node

> +       = build_distinct_type_copy (unsigned_intDI_type_node);

> +      (*lang_hooks.types.register_builtin_type) 

> (arm_simd_polyDI_type_node,

> + "__builtin_neon_poly64");

> +      arm_simd_polyTI_type_node

> +       = build_distinct_type_copy (unsigned_intTI_type_node);

> +      (*lang_hooks.types.register_builtin_type) 

> (arm_simd_polyTI_type_node,

> + "__builtin_neon_poly128");

> +      /* Init poly vector element types with scalar poly types.  */

> +      arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;

> +      arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;

> +      arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;

> +      arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;

> +

> +      /* Prevent front-ends from transforming poly vectors into string

> +        literals.  */

> +      TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;

> +      TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;

> +    }

>

>    /* Init all the element types built by the front-end.  */

>    arm_simd_types[Int8x8_t].eltype = intQI_type_node;

> @@ -985,11 +994,6 @@ arm_init_simd_builtin_types (void)

>    arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;

>    arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;

>

> -  /* Init poly vector element types with scalar poly types.  */

> -  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;

> -  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;

> -  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;

> -  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;

>    /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default

>       mangling.  */

>

> @@ -1006,6 +1010,8 @@ arm_init_simd_builtin_types (void)

>        tree eltype = arm_simd_types[i].eltype;

>        machine_mode mode = arm_simd_types[i].mode;

>

> +      if (eltype == NULL)

> +       continue;

>        if (arm_simd_types[i].itype == NULL)

>          arm_simd_types[i].itype =

>            build_distinct_type_copy

> @@ -1231,15 +1237,6 @@ arm_init_neon_builtins (void)

>       system.  */

>    arm_init_simd_builtin_scalar_types ();

>

> -  tree lane_check_fpr = build_function_type_list (void_type_node,

> - intSI_type_node,

> - intSI_type_node,

> -                                                 NULL);

> -  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =

> -      add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,

> -                           ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,

> -                           NULL, NULL_TREE);

> -

>    for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)

>      {

>        arm_builtin_datum *d = &neon_builtin_data[i];

> @@ -1956,6 +1953,15 @@ arm_init_builtins (void)

>

>    if (TARGET_MAYBE_HARD_FLOAT)

>      {

> +      tree lane_check_fpr = build_function_type_list (void_type_node,

> + intSI_type_node,

> + intSI_type_node,

> +                                                     NULL);

> +      arm_builtin_decls[ARM_BUILTIN_SIMD_LANE_CHECK]

> +      = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,

> +                             ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,

> +                             NULL, NULL_TREE);

> +

>        arm_init_neon_builtins ();

>        arm_init_vfp_builtins ();

>        arm_init_crypto_builtins ();

> @@ -2201,6 +2207,47 @@ neon_dereference_pointer (tree exp, tree type, 

> machine_mode mem_mode,

>                        build_int_cst (build_pointer_type (array_type), 

> 0));

>  }

>

> +/* EXP is a pointer argument to a vector scatter store intrinsics.

> +

> +   Consider the following example:

> +       VSTRW<v>.<dt> Qd, [Qm{, #+/-<imm>}]!

> +   When <Qm> used as the base register for the target address,

> +   this function is used to derive and return an expression for the

> +   accessed memory.

> +

> +   The intrinsic function operates on a block of registers that has mode

> +   REG_MODE.  This block contains vectors of type TYPE_MODE.  The 

> function

> +   references the memory at EXP of type TYPE and in mode MEM_MODE.  This

> +   mode may be BLKmode if no more suitable mode is available.  */

> +

> +static tree

> +mve_dereference_pointer (tree exp, tree type, machine_mode reg_mode,

> +                        machine_mode vector_mode)

> +{

> +  HOST_WIDE_INT reg_size, vector_size, nelems;

> +  tree elem_type, upper_bound, array_type;

> +

> +  /* Work out the size of each vector in bytes.  */

> +  vector_size = GET_MODE_SIZE (vector_mode);

> +

> +  /* Work out the size of the register block in bytes.  */

> +  reg_size = GET_MODE_SIZE (reg_mode);

> +

> +  /* Work out the type of each element.  */

> +  gcc_assert (POINTER_TYPE_P (type));

> +  elem_type = TREE_TYPE (type);

> +

> +  nelems = reg_size / vector_size;

> +

> +  /* Create a type that describes the full access.  */

> +  upper_bound = build_int_cst (size_type_node, nelems - 1);

> +  array_type = build_array_type (elem_type, build_index_type 

> (upper_bound));

> +

> +  /* Dereference EXP using that type.  */

> +  return fold_build2 (MEM_REF, array_type, exp,

> +                     build_int_cst (build_pointer_type (array_type), 0));

> +}

> +

>  /* Expand a builtin.  */

>  static rtx

>  arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,

> @@ -2239,10 +2286,17 @@ arm_expand_builtin_args (rtx target, 

> machine_mode map_mode, int fcode,

>              {

>                machine_mode other_mode

>                  = insn_data[icode].operand[1 - opno].mode;

> -              arg[argc] = neon_dereference_pointer (arg[argc],

> +             if (TARGET_HAVE_MVE && mode[argc] != other_mode)

> +               {

> +                 arg[argc] = mve_dereference_pointer (arg[argc],

> TREE_VALUE (formals),

> - mode[argc], other_mode,

> - map_mode);

> + other_mode, map_mode);

> +               }

> +             else

> +               arg[argc] = neon_dereference_pointer (arg[argc],

> + TREE_VALUE (formals),

> + mode[argc], other_mode,

> + map_mode);

>              }

>

>            /* Use EXPAND_MEMORY for ARG_BUILTIN_MEMORY and

> @@ -2548,22 +2602,6 @@ arm_expand_neon_builtin (int fcode, tree exp, 

> rtx target)

>        return const0_rtx;

>      }

>

> -  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)

> -    {

> -      /* Builtin is only to check bounds of the lane passed to some 

> intrinsics

> -        that are implemented with gcc vector extensions in 

> arm_neon.h.  */

> -

> -      tree nlanes = CALL_EXPR_ARG (exp, 0);

> -      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);

> -      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));

> -      if (CONST_INT_P (lane_idx))

> -       neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);

> -      else

> -       error ("%Klane index must be a constant immediate", exp);

> -      /* Don't generate any RTL.  */

> -      return const0_rtx;

> -    }

> -

>    arm_builtin_datum *d

>      = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];

>

> @@ -2625,6 +2663,22 @@ arm_expand_builtin (tree exp,

>    int mask;

>    int imm;

>

> +  if (fcode == ARM_BUILTIN_SIMD_LANE_CHECK)

> +    {

> +      /* Builtin is only to check bounds of the lane passed to some 

> intrinsics

> +        that are implemented with gcc vector extensions in 

> arm_neon.h.  */

> +

> +      tree nlanes = CALL_EXPR_ARG (exp, 0);

> +      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);

> +      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));

> +      if (CONST_INT_P (lane_idx))

> +       neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);

> +      else

> +       error ("%Klane index must be a constant immediate", exp);

> +      /* Don't generate any RTL.  */

> +      return const0_rtx;

> +    }

> +

>    if (fcode >= ARM_BUILTIN_ACLE_BASE)

>      return arm_expand_acle_builtin (fcode, exp, target);

>

> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c

> index 

> 34695fa0112e90e4bdf317da0b9fd1d3194bf0a2..0fe7d371c348818f25901c5d84be94589523c9a6 

> 100644

> --- a/gcc/config/arm/arm-c.c

> +++ b/gcc/config/arm/arm-c.c

> @@ -79,6 +79,16 @@ arm_cpu_builtins (struct cpp_reader* pfile)

>    def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);

>    def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);

>

> +  cpp_undef (pfile, "__ARM_FEATURE_MVE");

> +  if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)

> +    {

> +      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 3);

> +    }

> +  else if (TARGET_HAVE_MVE)

> +    {

> +      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 1);

> +    }

> +

>    cpp_undef (pfile, "__ARM_FEATURE_CMSE");

>    if (arm_arch8 && !arm_arch_notm)

>      {

> diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h

> index 

> 5b49049cc45c0bccfa9d67eac0940250fc5dd95a..d4612ae4553697989611d772f7bb0061a04b98b6 

> 100644

> --- a/gcc/config/arm/arm-protos.h

> +++ b/gcc/config/arm/arm-protos.h

> @@ -85,7 +85,7 @@ extern bool ldm_stm_operation_p (rtx, bool, 

> machine_mode mode,

>  extern bool clear_operation_p (rtx, bool);

>  extern int arm_const_double_rtx (rtx);

>  extern int vfp3_const_double_rtx (rtx);

> -extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, 

> int *);

> +extern int simd_immediate_valid_for_move (rtx, machine_mode, rtx *, 

> int *);

>  extern int neon_immediate_valid_for_logic (rtx, machine_mode, int, rtx *,

>                                             int *);

>  extern int neon_immediate_valid_for_shift (rtx, machine_mode, rtx *,

> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h

> index 

> 8b07c423fb6b071642fccc48424fe244d97dcbc2..c755df420b52798773ee99f54faf6689d4a16215 

> 100644

> --- a/gcc/config/arm/arm.h

> +++ b/gcc/config/arm/arm.h

> @@ -751,7 +751,8 @@ extern int arm_arch_cmse;

>  /*      s0-s15          VFP scratch (aka d0-d7).

>          s16-s31       S  VFP variable (aka d8-d15).

>          vfpcc           Not a real register.  Represents the VFP 

> condition

> -                       code flags.  */

> +                       code flags.

> +       vpr             Used to represent MVE VPR predication.  */

>

>  /* The stack backtrace structure is as follows:

>    fp points to here:  |  save code pointer  |      [fp]

> @@ -792,7 +793,7 @@ extern int arm_arch_cmse;

>    1,1,1,1,1,1,1,1,             \

>    1,1,1,1,                     \

>    /* Specials.  */             \

> -  1,1,1,1,1,1                  \

> +  1,1,1,1,1,1,1                        \

>  }

>

>  /* 1 for registers not available across function calls.

> @@ -822,7 +823,7 @@ extern int arm_arch_cmse;

>    1,1,1,1,1,1,1,1,             \

>    1,1,1,1,                     \

>    /* Specials.  */             \

> -  1,1,1,1,1,1                  \

> +  1,1,1,1,1,1,1                        \

>  }

>

>  #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE

> @@ -998,10 +999,10 @@ extern int arm_arch_cmse;

>     && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))

>

>  /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP

> -   + 1 APSRQ + 1 APSRGE.  */

> +   + 1 APSRQ + 1 APSRGE + 1 VPR.  */

>  /* Intel Wireless MMX Technology registers add 16 + 4 more.  */

>  /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */

> -#define FIRST_PSEUDO_REGISTER   106

> +#define FIRST_PSEUDO_REGISTER   107

>

>  #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)

>

> @@ -1029,11 +1030,26 @@ extern int arm_arch_cmse;

>    ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \

>     || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)

>

> +#define VALID_MVE_MODE(MODE) \

> +  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \

> +   || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \

> +   || (MODE) == V2DFmode)

> +

> +#define VALID_MVE_SI_MODE(MODE) \

> +  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \

> +   || (MODE) == V16QImode)

> +

> +#define VALID_MVE_SF_MODE(MODE) \

> +  ((MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DFmode)

> +

>  /* Structure modes valid for Neon registers.  */

>  #define VALID_NEON_STRUCT_MODE(MODE) \

>    ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \

>     || (MODE) == CImode || (MODE) == XImode)

>

> +#define VALID_MVE_STRUCT_MODE(MODE) \

> +  ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)

> +

>  /* The register numbers in sequence, for passing to 

> arm_gen_load_multiple.  */

>  extern int arm_regs_in_sequence[];

>

> @@ -1085,9 +1101,13 @@ extern int arm_regs_in_sequence[];

>    /* Registers not for general use.  */                \

>    CC_REGNUM, VFPCC_REGNUM,                     \

>    FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,    \

> -  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM    \

> +  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,   \

> +  VPR_REGNUM                                   \

>  }

>

> +#define IS_VPR_REGNUM(REGNUM) \

> +  ((REGNUM) == VPR_REGNUM)

> +

>  /* Use different register alloc ordering for Thumb.  */

>  #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()

>

> @@ -1124,6 +1144,7 @@ enum reg_class

>    VFPCC_REG,

>    SFP_REG,

>    AFP_REG,

> +  VPR_REG,

>    ALL_REGS,

>    LIM_REG_CLASSES

>  };

> @@ -1131,7 +1152,7 @@ enum reg_class

>  #define N_REG_CLASSES  (int) LIM_REG_CLASSES

>

>  /* Give names of register classes as strings for dump file.  */

> -#define REG_CLASS_NAMES  \

> +#define REG_CLASS_NAMES \

>  {                       \

>    "NO_REGS",           \

>    "LO_REGS",           \

> @@ -1151,6 +1172,7 @@ enum reg_class

>    "VFPCC_REG",         \

>    "SFP_REG",           \

>    "AFP_REG",           \

> +  "VPR_REG",           \

>    "ALL_REGS"           \

>  }

>

> @@ -1177,7 +1199,8 @@ enum reg_class

>    { 0x00000000, 0x00000000, 0x00000000, 0x00000020 }, /* VFPCC_REG */  \

>    { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */    \

>    { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */    \

> -  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS */   \

> +  { 0x00000000, 0x00000000, 0x00000000, 0x00000100 }, /* VPR_REG.  */  \

> +  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000010F }  /* ALL_REGS.  */ \

>  }

>

>  #define FP_SYSREGS \

> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c

> index 

> 883c2a9179d7e6d69225f8d104228d15702ecef7..6faed76206b93c1a9dea048e2f693dc16ee58072 

> 100644

> --- a/gcc/config/arm/arm.c

> +++ b/gcc/config/arm/arm.c

> @@ -3759,7 +3759,8 @@ arm_options_perform_arch_sanity_checks (void)

>        else if (TARGET_HARD_FLOAT_ABI)

>          {

>            arm_pcs_default = ARM_PCS_AAPCS_VFP;

> -         if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))

> +         if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2)

> +             && !bitmap_bit_p (arm_active_target.isa, isa_bit_mve))

>              error ("%<-mfloat-abi=hard%>: selected processor lacks an 

> FPU");

>          }

>        else

> @@ -4230,7 +4231,7 @@ use_return_insn (int iscond, rtx sibling)

>

>    /* Can't be done if any of the VFP regs are pushed,

>       since this also requires an insn.  */

> -  if (TARGET_HARD_FLOAT)

> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)

>      for (regno = FIRST_VFP_REGNUM; regno <= LAST_VFP_REGNUM; regno++)

>   ��    if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p 

> (regno))

>          return 0;

> @@ -6289,7 +6290,7 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, 

> machine_mode mode,

>        {

>          pcum->aapcs_vfp_reg_alloc = mask << regno;

>          if (mode == BLKmode

> -           || (mode == TImode && ! TARGET_NEON)

> +           || (mode == TImode && ! (TARGET_NEON || TARGET_HAVE_MVE))

>              || ! arm_hard_regno_mode_ok (FIRST_VFP_REGNUM + regno, mode))

>            {

>              int i;

> @@ -6297,7 +6298,7 @@ aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, 

> machine_mode mode,

>              int rshift = shift;

>              machine_mode rmode = pcum->aapcs_vfp_rmode;

>              rtx par;

> -           if (!TARGET_NEON)

> +           if (!(TARGET_NEON || TARGET_HAVE_MVE))

>                {

>                  /* Avoid using unsupported vector modes. */

>                  if (rmode == V2SImode)

> @@ -6343,7 +6344,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs 

> pcs_variant ATTRIBUTE_UNUSED,

>    if (mode == BLKmode

>        || (GET_MODE_CLASS (mode) == MODE_INT

>            && GET_MODE_SIZE (mode) >= GET_MODE_SIZE (TImode)

> -         && !TARGET_NEON))

> +         && !(TARGET_NEON || TARGET_HAVE_MVE)))

>      {

>        int count;

>        machine_mode ag_mode;

> @@ -6354,7 +6355,7 @@ aapcs_vfp_allocate_return_reg (enum arm_pcs 

> pcs_variant ATTRIBUTE_UNUSED,

>        aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,

>                                               &ag_mode, &count);

>

> -      if (!TARGET_NEON)

> +      if (!(TARGET_NEON || TARGET_HAVE_MVE))

>          {

>            if (ag_mode == V2SImode)

>              ag_mode = DImode;

> @@ -8253,7 +8254,9 @@ thumb2_legitimate_address_p (machine_mode mode, 

> rtx x, int strict_p)

>                     && CONST_INT_P (XEXP (XEXP (x, 0), 1)))))

>      return 1;

>

> -  else if (mode == TImode || (TARGET_NEON && VALID_NEON_STRUCT_MODE 

> (mode)))

> +  else if (mode == TImode

> +          || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode))

> +          || (TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode)))

>      return 0;

>

>    else if (code == PLUS)

> @@ -9800,7 +9803,7 @@ arm_rtx_costs_internal (rtx x, enum rtx_code 

> code, enum rtx_code outer_code,

>            /* Assume that most copies can be done with a single insn,

>               unless we don't have HW FP, in which case everything

>               larger than word mode will require two insns. */

> -         *cost = COSTS_N_INSNS (((!TARGET_HARD_FLOAT

> +         *cost = COSTS_N_INSNS (((!(TARGET_HARD_FLOAT || TARGET_HAVE_MVE)

>                                     && GET_MODE_SIZE (mode) > 4)

>                                    || mode == DImode)

>                                   ? 2 : 1);

> @@ -11281,10 +11284,10 @@ arm_rtx_costs_internal (rtx x, enum rtx_code 

> code, enum rtx_code outer_code,

>

>      case CONST_VECTOR:

>        /* Fixme.  */

> -      if (TARGET_NEON

> -         && TARGET_HARD_FLOAT

> -         && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))

> -         && neon_immediate_valid_for_move (x, mode, NULL, NULL))

> +      if (((TARGET_NEON && TARGET_HARD_FLOAT

> +           && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE 

> (mode)))

> +          || TARGET_HAVE_MVE)

> +         && simd_immediate_valid_for_move (x, mode, NULL, NULL))

>          *cost = COSTS_N_INSNS (1);

>        else

>          *cost = COSTS_N_INSNS (4);

> @@ -12328,8 +12331,8 @@ vfp3_const_double_rtx (rtx x)

>    return vfp3_const_double_index (x) != -1;

>  }

>

> -/* Recognize immediates which can be used in various Neon 

> instructions. Legal

> -   immediates are described by the following table (for VMVN 

> variants, the

> +/* Recognize immediates which can be used in various Neon and MVE 

> instructions.

> +   Legal immediates are described by the following table (for VMVN 

> variants, the

>     bitwise inverse of the constant shown is recognized. In either 

> case, VMOV

>     is output and the correct instruction to use for a given constant 

> is chosen

>     by the assembler). The constant shown is replicated across all 

> elements of

> @@ -12380,7 +12383,7 @@ vfp3_const_double_rtx (rtx x)

>     -1 if the given value doesn't match any of the listed patterns.

>  */

>  static int

> -neon_valid_immediate (rtx op, machine_mode mode, int inverse,

> +simd_valid_immediate (rtx op, machine_mode mode, int inverse,

>                        rtx *modconst, int *elementwidth)

>  {

>  #define CHECK(STRIDE, ELSIZE, CLASS, TEST)      \

> @@ -12412,6 +12415,10 @@ neon_valid_immediate (rtx op, machine_mode 

> mode, int inverse,

>

>    innersize = GET_MODE_UNIT_SIZE (mode);

>

> +  /* Only support 128-bit vectors for MVE.  */

> +  if (TARGET_HAVE_MVE && (!vector || n_elts * innersize != 16))

> +    return -1;

> +

>    /* Vectors of float constants.  */

>    if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)

>      {

> @@ -12560,18 +12567,19 @@ neon_valid_immediate (rtx op, machine_mode 

> mode, int inverse,

>  #undef CHECK

>  }

>

> -/* Return TRUE if rtx X is legal for use as either a Neon VMOV (or, 

> implicitly,

> -   VMVN) immediate. Write back width per element to *ELEMENTWIDTH (or 

> zero for

> -   float elements), and a modified constant (whatever should be 

> output for a

> -   VMOV) in *MODCONST.  */

> -

> +/* Return TRUE if rtx X is legal for use as either a Neon or MVE VMOV 

> (or,

> +   implicitly, VMVN) immediate.  Write back width per element to 

> *ELEMENTWIDTH

> +   (or zero for float elements), and a modified constant (whatever 

> should be

> +   output for a VMOV) in *MODCONST. "neon_immediate_valid_for_move" 

> function is

> +   modified to "simd_immediate_valid_for_move" as this function will 

> be used

> +   both by neon and mve.  */

>  int

> -neon_immediate_valid_for_move (rtx op, machine_mode mode,

> +simd_immediate_valid_for_move (rtx op, machine_mode mode,

>                                 rtx *modconst, int *elementwidth)

>  {

>    rtx tmpconst;

>    int tmpwidth;

> -  int retval = neon_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);

> +  int retval = simd_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);

>

>    if (retval == -1)

>      return 0;

> @@ -12588,7 +12596,7 @@ neon_immediate_valid_for_move (rtx op, 

> machine_mode mode,

>  /* Return TRUE if rtx X is legal for use in a VORR or VBIC 

> instruction.  If

>     the immediate is valid, write a constant suitable for using as an 

> operand

>     to VORR/VBIC/VAND/VORN to *MODCONST and the corresponding element 

> width to

> -   *ELEMENTWIDTH. See neon_valid_immediate for description of 

> INVERSE.  */

> +   *ELEMENTWIDTH.  See simd_valid_immediate for description of 

> INVERSE.  */

>

>  int

>  neon_immediate_valid_for_logic (rtx op, machine_mode mode, int inverse,

> @@ -12596,7 +12604,7 @@ neon_immediate_valid_for_logic (rtx op, 

> machine_mode mode, int inverse,

>  {

>    rtx tmpconst;

>    int tmpwidth;

> -  int retval = neon_valid_immediate (op, mode, inverse, &tmpconst, 

> &tmpwidth);

> +  int retval = simd_valid_immediate (op, mode, inverse, &tmpconst, 

> &tmpwidth);

>

>    if (retval < 0 || retval > 5)

>      return 0;

> @@ -12803,7 +12811,7 @@ neon_make_constant (rtx vals)

>      gcc_unreachable ();

>

>    if (const_vec != NULL

> -      && neon_immediate_valid_for_move (const_vec, mode, NULL, NULL))

> +      && simd_immediate_valid_for_move (const_vec, mode, NULL, NULL))

>      /* Load using VMOV.  On Cortex-A8 this takes one cycle.  */

>      return const_vec;

>    else if ((target = neon_vdup_constant (vals)) != NULL_RTX)

> @@ -13080,6 +13088,15 @@ neon_vector_mem_operand (rtx op, int type, 

> bool strict)

>        && (INTVAL (XEXP (ind, 1)) & 3) == 0)

>      return TRUE;

>

> +  if (type == 1 && TARGET_HAVE_MVE

> +      && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))

> +    {

> +      rtx ind1 = XEXP (ind, 0);

> +      if (!REG_P (ind1))

> +       return 0;

> +      return NEON_REGNO_OK_FOR_QUAD (REGNO (ind1));

> +    }

> +

>    return FALSE;

>  }

>

> @@ -19936,7 +19953,7 @@ output_move_neon (rtx *operands)

>      {

>      case POST_INC:

>        /* We have to use vldm / vstm for too-large modes. */

> -      if (nregs > 4)

> +      if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))

>          {

>            templ = "v%smia%%?\t%%0!, %%h1";

>            ops[0] = XEXP (addr, 0);

> @@ -19965,7 +19982,7 @@ output_move_neon (rtx *operands)

>        /* We have to use vldm / vstm for too-large modes. */

>        if (nregs > 1)

>          {

> -         if (nregs > 4)

> +         if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))

>              templ = "v%smia%%?\t%%m0, %%h1";

>            else

>              templ = "v%s1.64\t%%h1, %%A0";

> @@ -19980,29 +19997,40 @@ output_move_neon (rtx *operands)

>        {

>          int i;

>          int overlap = -1;

> -       for (i = 0; i < nregs; i++)

> +       if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN)

>            {

> -           /* We're only using DImode here because it's a convenient 

> size.  */

> -           ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);

> -           ops[1] = adjust_address (mem, DImode, 8 * i);

> -           if (reg_overlap_mentioned_p (ops[0], mem))

> +           sprintf (buff, "v%srw.32\t%%q0, %%1", load ? "ld" : "st");

> +           ops[0] = reg;

> +           ops[1] = mem;

> +           output_asm_insn (buff, ops);

> +         }

> +       else

> +         {

> +           for (i = 0; i < nregs; i++)

>                {

> -               gcc_assert (overlap == -1);

> -               overlap = i;

> +               /* We're only using DImode here because it's a convenient

> +                  size.  */

> +               ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);

> +               ops[1] = adjust_address (mem, DImode, 8 * i);

> +               if (reg_overlap_mentioned_p (ops[0], mem))

> +                 {

> +                   gcc_assert (overlap == -1);

> +                   overlap = i;

> +                 }

> +               else

> +                 {

> +                   sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : 

> "st");

> +                   output_asm_insn (buff, ops);

> +                 }

>                }

> -           else

> +           if (overlap != -1)

>                {

> +               ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);

> +               ops[1] = adjust_address (mem, SImode, 8 * overlap);

>                  sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");

>                  output_asm_insn (buff, ops);

>                }

>            }

> -       if (overlap != -1)

> -         {

> -           ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);

> -           ops[1] = adjust_address (mem, SImode, 8 * overlap);

> -           sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");

> -           output_asm_insn (buff, ops);

> -         }

>

>          return "";

>        }

> @@ -22223,7 +22251,7 @@ arm_compute_frame_layout (void)

>        func_type = arm_current_func_type ();

>        /* Space for saved VFP registers.  */

>        if (! IS_VOLATILE (func_type)

> -         && TARGET_HARD_FLOAT)

> +         && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))

>          saved += arm_get_vfp_saved_size ();

>

>        /* Allocate space for saving/restoring FPCXTNS in Armv8.1-M 

> Mainline

> @@ -22447,7 +22475,7 @@ arm_save_coproc_regs(void)

>          saved_size += 8;

>        }

>

> -  if (TARGET_HARD_FLOAT)

> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)

>      {

>        start_reg = FIRST_VFP_REGNUM;

>

> @@ -23749,6 +23777,53 @@ arm_print_operand (FILE *stream, rtx x, int code)

>        }

>        return;

>

> +    /* To print the memory operand with "Us" constraint. Based on the 

> rtx_code

> +       the memory operands output looks like following.

> +       1. [Rn], #+/-<imm>

> +       2. [Rn, #+/-<imm>]!

> +       3. [Rn].  */

> +    case 'E':

> +      {

> +       rtx addr;

> +       rtx postinc_reg = NULL;

> +       unsigned inc_val = 0;

> +       enum rtx_code code;

> +

> +       gcc_assert (MEM_P (x));

> +       addr = XEXP (x, 0);

> +       code = GET_CODE (addr);

> +       if (code == POST_INC || code == POST_DEC || code == PRE_INC

> +           || code  == PRE_DEC)

> +         {

> +           asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));

> +           inc_val = GET_MODE_SIZE (GET_MODE (x));

> +           if (code == POST_INC || code == POST_DEC)

> +             asm_fprintf (stream, "], #%s%d",(code == POST_INC)

> +                                             ? "": "-", inc_val);

> +           else

> +             asm_fprintf (stream, ", #%s%d]!",(code == PRE_INC)

> +                                              ? "": "-", inc_val);

> +         }

> +       else if (code == POST_MODIFY || code == PRE_MODIFY)

> +         {

> +           asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));

> +           postinc_reg = XEXP ( XEXP (x, 1), 1);

> +           if (postinc_reg && CONST_INT_P (postinc_reg))

> +             {

> +               if (code == POST_MODIFY)

> +                 asm_fprintf (stream, "], #%wd",INTVAL (postinc_reg));

> +               else

> +                 asm_fprintf (stream, ", #%wd]!",INTVAL (postinc_reg));

> +             }

> +         }

> +       else

> +         {

> +           gcc_assert (REG_P (addr));

> +           asm_fprintf (stream, "[%r]",REGNO (addr));

> +         }

> +      }

> +      return;

> +

>      case 'C':

>        {

>          rtx addr;

> @@ -23926,9 +24001,10 @@ arm_print_operand_address (FILE *stream, 

> machine_mode mode, rtx x)

>                           REGNO (XEXP (x, 0)),

>                           GET_CODE (x) == PRE_DEC ? "-" : "",

>                           GET_MODE_SIZE (mode));

> +         else if (TARGET_HAVE_MVE && (mode == OImode || mode == XImode))

> +           asm_fprintf (stream, "[%r]!", REGNO (XEXP (x,0)));

>            else

> -           asm_fprintf (stream, "[%r], #%s%d",

> -                        REGNO (XEXP (x, 0)),

> +           asm_fprintf (stream, "[%r], #%s%d", REGNO (XEXP (x, 0)),

>                           GET_CODE (x) == POST_DEC ? "-" : "",

>                           GET_MODE_SIZE (mode));

>          }

> @@ -24773,12 +24849,15 @@ arm_hard_regno_mode_ok (unsigned int regno, 

> machine_mode mode)

>  {

>    if (GET_MODE_CLASS (mode) == MODE_CC)

>      return (regno == CC_REGNUM

> -           || (TARGET_HARD_FLOAT

> +           || ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE)

>                  && regno == VFPCC_REGNUM));

>

>    if (regno == CC_REGNUM && GET_MODE_CLASS (mode) != MODE_CC)

>      return false;

>

> +  if (IS_VPR_REGNUM (regno))

> +    return true;

> +

>    if (TARGET_THUMB1)

>      /* For the Thumb we only allow values bigger than SImode in

>         registers 0 - 6, so that there is always a second low

> @@ -24787,7 +24866,7 @@ arm_hard_regno_mode_ok (unsigned int regno, 

> machine_mode mode)

>         start of an even numbered register pair.  */

>      return (ARM_NUM_REGS (mode) < 2) || (regno < LAST_LO_REGNUM);

>

> -  if (TARGET_HARD_FLOAT && IS_VFP_REGNUM (regno))

> +  if ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && IS_VFP_REGNUM (regno))

>      {

>        if (mode == SFmode || mode == SImode)

>          return VFP_REGNO_OK_FOR_SINGLE (regno);

> @@ -24811,6 +24890,10 @@ arm_hard_regno_mode_ok (unsigned int regno, 

> machine_mode mode)

>                 || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))

>                 || (mode == CImode && NEON_REGNO_OK_FOR_NREGS (regno, 6))

>                 || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8));

> +     if (TARGET_HAVE_MVE)

> +       return ((VALID_MVE_MODE (mode) && NEON_REGNO_OK_FOR_QUAD (regno))

> +              || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))

> +              || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)));

>

>        return false;

>      }

> @@ -24859,13 +24942,18 @@ arm_modes_tieable_p (machine_mode mode1, 

> machine_mode mode2)

>    /* We specifically want to allow elements of "structure" modes to

>       be tieable to the structure.  This more general condition allows

>       other rarer situations too.  */

> -  if (TARGET_NEON

> -      && (VALID_NEON_DREG_MODE (mode1)

> -         || VALID_NEON_QREG_MODE (mode1)

> -         || VALID_NEON_STRUCT_MODE (mode1))

> -      && (VALID_NEON_DREG_MODE (mode2)

> -         || VALID_NEON_QREG_MODE (mode2)

> -         || VALID_NEON_STRUCT_MODE (mode2)))

> +  if ((TARGET_NEON

> +       && (VALID_NEON_DREG_MODE (mode1)

> +          || VALID_NEON_QREG_MODE (mode1)

> +          || VALID_NEON_STRUCT_MODE (mode1))

> +       && (VALID_NEON_DREG_MODE (mode2)

> +          || VALID_NEON_QREG_MODE (mode2)

> +          || VALID_NEON_STRUCT_MODE (mode2)))

> +      || (TARGET_HAVE_MVE

> +         && (VALID_MVE_MODE (mode1)

> +             || VALID_MVE_STRUCT_MODE (mode1))

> +         && (VALID_MVE_MODE (mode2)

> +             || VALID_MVE_STRUCT_MODE (mode2))))

>      return true;

>

>    return false;

> @@ -24880,6 +24968,9 @@ arm_regno_class (int regno)

>    if (regno == PC_REGNUM)

>      return NO_REGS;

>

> +  if (IS_VPR_REGNUM (regno))

> +    return VPR_REG;

> +

>    if (TARGET_THUMB1)

>      {

>        if (regno == STACK_POINTER_REGNUM)

> @@ -26731,7 +26822,7 @@ arm_expand_epilogue_apcs_frame (bool 

> really_return)

>          floats_from_frame += 4;

>        }

>

> -  if (TARGET_HARD_FLOAT)

> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)

>      {

>        int start_reg;

>        rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM);

> @@ -26977,7 +27068,7 @@ arm_expand_epilogue (bool really_return)

>          }

>      }

>

> -  if (TARGET_HARD_FLOAT)

> +  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)

>      {

>        /* Generate VFP register multi-pop.  */

>        int end_reg = LAST_VFP_REGNUM + 1;

> @@ -27148,7 +27239,7 @@ arm_expand_epilogue (bool really_return)

>                                                     GEN_INT 

> (FPCXTNS_ENUM)));

>            RTX_FRAME_RELATED_P (insn) = 1;

>          }

> -      }

> +    }

>

>    if (!really_return)

>      return;

> @@ -28370,6 +28461,15 @@ arm_vector_mode_supported_p (machine_mode mode)

>        || mode == V2HAmode))

>      return true;

>

> +  if (TARGET_HAVE_MVE

> +      && (mode == V2DImode || mode == V4SImode || mode == V8HImode

> +         || mode == V16QImode))

> +      return true;

> +

> +  if (TARGET_HAVE_MVE_FLOAT

> +      && (mode == V2DFmode || mode == V4SFmode || mode == V8HFmode))

> +      return true;

> +

>    return false;

>  }

>

> @@ -28387,6 +28487,10 @@ arm_array_mode_supported_p (machine_mode mode,

>        && (nelems >= 2 && nelems <= 4))

>      return true;

>

> +  if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN

> +      && VALID_MVE_MODE (mode) && (nelems == 2 || nelems == 4))

> +    return true;

> +

>    return false;

>  }

>

> @@ -29435,7 +29539,7 @@ arm_conditional_register_usage (void)

>    if (TARGET_THUMB1)

>      fixed_regs[LR_REGNUM] = call_used_regs[LR_REGNUM] = 1;

>

> -  if (TARGET_32BIT && TARGET_HARD_FLOAT)

> +  if (TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))

>      {

>        /* VFPv3 registers are disabled when earlier VFP

>           versions are selected due to the definition of

> @@ -29447,6 +29551,8 @@ arm_conditional_register_usage (void)

>            call_used_regs[regno] = regno < FIRST_VFP_REGNUM + 16

>              || regno >= FIRST_VFP_REGNUM + 32;

>          }

> +      if (TARGET_HAVE_MVE)

> +       fixed_regs[VPR_REGNUM] = 0;

>      }

>

>    if (TARGET_REALLY_IWMMXT && !TARGET_GENERAL_REGS_ONLY)

> diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md

> index 

> c62ad1b360ebecd5368e90ea5634488eef22f2fc..689baa0b0ff63ef90f47d2fd844cb98c9a1457a0 

> 100644

> --- a/gcc/config/arm/arm.md

> +++ b/gcc/config/arm/arm.md

> @@ -41,6 +41,7 @@

>     (VFPCC_REGNUM    101)       ; VFP Condition code pseudo register

>     (APSRQ_REGNUM    104)       ; Q bit pseudo register

>     (APSRGE_REGNUM   105)       ; GE bits pseudo register

> +   (VPR_REGNUM      106)       ; Vector Predication Register - MVE 

> register.

>    ]

>  )

>  ;; 3rd operand to select_dominance_cc_mode

> @@ -7293,7 +7294,7 @@

>    [(set (match_operand:SF 0 "nonimmediate_operand" "=r,r,m")

>          (match_operand:SF 1 "general_operand"  "r,mE,r"))]

>    "TARGET_32BIT

> -   && TARGET_SOFT_FLOAT

> +   && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE

>     && (!MEM_P (operands[0])

>         || register_operand (operands[1], SFmode))"

>  {

> @@ -7416,8 +7417,8 @@

>

>  (define_insn "*movdf_soft_insn"

>    [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=r,r,r,r,m")

> -       (match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]

> -  "TARGET_32BIT && TARGET_SOFT_FLOAT

> +       (match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]

> +  "TARGET_32BIT && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE

>     && (   register_operand (operands[0], DFmode)

>         || register_operand (operands[1], DFmode))"

>    "*

> @@ -11681,7 +11682,7 @@

>                     (match_operand:SI 2 "const_int_I_operand" "I")))

>       (set (match_operand:DF 3 "vfp_hard_register_operand" "")

>            (mem:DF (match_dup 1)))])]

> -  "TARGET_32BIT && TARGET_HARD_FLOAT"

> +  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"

>    "*

>    {

>      int num_regs = XVECLEN (operands[0], 0);

> @@ -12624,7 +12625,7 @@

>     (set_attr "length" "8")]

>  )

>

> -;; Vector bits common to IWMMXT and Neon

> +;; Vector bits common to IWMMXT, Neon and MVE

>  (include "vec-common.md")

>  ;; Load the Intel Wireless Multimedia Extension patterns

>  (include "iwmmxt.md")

> @@ -12642,3 +12643,5 @@

>  (include "sync.md")

>  ;; Fixed-point patterns

>  (include "arm-fixed.md")

> +;; M-profile Vector Extensions

> +(include "mve.md")

> diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..5ffb466596b5d8fc330616a6fcc7ee37d3e28def

> --- /dev/null

> +++ b/gcc/config/arm/arm_mve.h

> @@ -0,0 +1,59 @@

> +/* Arm MVE intrinsics include file.

> +

> +   Copyright (C) 2019 Free Software Foundation, Inc.

> +   Contributed by Arm.

> +

> +   This file is part of GCC.

> +

> +   GCC is free software; you can redistribute it and/or modify it

> +   under the terms of the GNU General Public License as published

> +   by the Free Software Foundation; either version 3, or (at your

> +   option) any later version.

> +

> +   GCC is distributed in the hope that it will be useful, but WITHOUT

> +   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY

> +   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public

> +   License for more details.

> +

> +   You should have received a copy of the GNU General Public License

> +   along with GCC; see the file COPYING3.  If not see

> +   <http://www.gnu.org/licenses/>. */

> +

> +#ifndef _GCC_ARM_MVE_H

> +#define _GCC_ARM_MVE_H

> +

> +#if !__ARM_FEATURE_MVE

> +#error "MVE feature not supported"

> +#endif

> +

> +#include <stdint.h>

> +#ifndef  __cplusplus

> +#include <stdbool.h>

> +#endif

> +

> +#ifdef __cplusplus

> +extern "C" {

> +#endif

> +

> +#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */

> +typedef __fp16 float16_t;

> +typedef float float32_t;

> +typedef __simd128_float16_t float16x8_t;

> +typedef __simd128_float32_t float32x4_t;

> +#endif

> +

> +typedef uint16_t mve_pred16_t;

> +typedef __simd128_uint8_t uint8x16_t;

> +typedef __simd128_uint16_t uint16x8_t;

> +typedef __simd128_uint32_t uint32x4_t;

> +typedef __simd128_uint64_t uint64x2_t;

> +typedef __simd128_int8_t int8x16_t;

> +typedef __simd128_int16_t int16x8_t;

> +typedef __simd128_int32_t int32x4_t;

> +typedef __simd128_int64_t int64x2_t;

> +

> +#ifdef __cplusplus

> +}

> +#endif

> +

> +#endif /* _GCC_ARM_MVE_H.  */

> diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md

> index 

> 6f309b95cc1874ac7bc69e435781070e0c9cb70a..f77084a0efd489491372bb1dafbc0cd585f0f518 

> 100644

> --- a/gcc/config/arm/constraints.md

> +++ b/gcc/config/arm/constraints.md

> @@ -44,6 +44,8 @@

>  ;; in Thumb state: Uu, Uw

>  ;; in all states: Q

>

> +(define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : NO_REGS"

> +  "MVE VPR register")

>

>  (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"

>   "The VFP registers @code{s0}-@code{s31}.")

> diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md

> index 

> c412851843f4468c2c18bce264288705e076ac50..e30325bc1652d378be2544fa32269c5c4294d7e9 

> 100644

> --- a/gcc/config/arm/iterators.md

> +++ b/gcc/config/arm/iterators.md

> @@ -62,6 +62,12 @@

>  ;; Integer and float modes supported by Neon and IWMMXT.

>  (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI 

> V4SF])

>

> +;; Integer and float modes supported by Neon, IWMMXT and MVE.

> +(define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])

> +

> +;; Integer and float modes supported by Neon and IWMMXT but not MVE.

> +(define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])

> +

>  ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.

>  (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])

>

> @@ -105,7 +111,8 @@

>  (define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])

>

>  ;; Opaque structure types wider than TImode.

> -(define_mode_iterator VSTRUCT [EI OI CI XI])

> +(define_mode_iterator VSTRUCT [(EI "!TARGET_HAVE_MVE") OI

> +                              (CI "!TARGET_HAVE_MVE") XI])

>

>  ;; Opaque structure types used in table lookups (except vtbl1/vtbx1).

>  (define_mode_iterator VTAB [TI EI OI])

> diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..53334c6d329dedd482615b996232e85ded7a34f8

> --- /dev/null

> +++ b/gcc/config/arm/mve.md

> @@ -0,0 +1,78 @@

> +;; Arm M-profile Vector Extension Machine Description

> +;; Copyright (C) 2019 Free Software Foundation, Inc.

> +;;

> +;; This file is part of GCC.

> +;;

> +;; GCC is free software; you can redistribute it and/or modify it

> +;; under the terms of the GNU General Public License as published by

> +;; the Free Software Foundation; either version 3, or (at your option)

> +;; any later version.

> +;;

> +;; GCC is distributed in the hope that it will be useful, but

> +;; WITHOUT ANY WARRANTY; without even the implied warranty of

> +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU

> +;; General Public License for more details.

> +;;

> +;; You should have received a copy of the GNU General Public License

> +;; along with GCC; see the file COPYING3.  If not see

> +;; <http://www.gnu.org/licenses/>.

> +

> +(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])

> +(define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")

> +                             (V2DI "u64")])

> +

> +(define_insn "*mve_mov<mode>"

> +  [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")

> +       (match_operand:MVE_types 1 "general_operand" 

> "w,r,w,Dn,Usi,r,Dm"))]

> +  "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"

> +{

> +  if (which_alternative == 3 || which_alternative == 6)

> +    {

> +      int width, is_valid;

> +      static char templ[40];

> +

> +      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,

> +       &operands[1], &width);

> +

> +      gcc_assert (is_valid != 0);

> +

> +      if (width == 0)

> +       return "vmov.f32\t%q0, %1  @ <mode>";

> +      else

> +       sprintf (templ, "vmov.i%d\t%%q0, %%x1  @ <mode>", width);

> +      return templ;

> +    }

> +  switch (which_alternative)

> +    {

> +    case 0:

> +      return "vmov\t%q0, %q1";

> +    case 1:

> +      return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";

> +    case 2:

> +      return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";

> +    case 4:

> +      if ((TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))

> +         || (MEM_P (operands[1])

> +             && GET_CODE (XEXP (operands[1], 0)) == LABEL_REF))

> +       return output_move_neon (operands);

> +      else

> +       return "vldrb.<V_sz_elem2> %q0, %E1";

> +    case 5:

> +      return output_move_neon (operands);

> +    case 6:

> +    default:

> +      gcc_unreachable ();

> +      return "";

> +    }

> +}

> +  [(set_attr "type" 

> "mve_move,mve_move,mve_move,mve_move,mve_load,mve_move,mve_move")

> +   (set_attr "length" "4,8,8,4,8,8,4")

> +   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*")

> +   (set_attr "neg_pool_range" "*,*,*,*,996,*,*")])

> +

> +(define_insn "*mve_vstr<mode>"

> +  [(set (match_operand:MVE_types 0 "memory_operand" "=Us")

> +       (match_operand:MVE_types 1 "s_register_operand" "w"))]

> +  "TARGET_HAVE_MVE"

> +  "vstrb.<V_sz_elem> %q1, %E0"

> +  [(set_attr "type" "mve_store")])

> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md

> index 

> 6a0ee28efc9aa9f1fba7b5ae031564f40aa095fe..c23783e0ed914ec21a92828388ada58ada3c6132 

> 100644

> --- a/gcc/config/arm/neon.md

> +++ b/gcc/config/arm/neon.md

> @@ -35,9 +35,9 @@

>

>  (define_insn "*neon_mov<mode>"

>    [(set (match_operand:VDX 0 "nonimmediate_operand"

> -         "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")

> +       "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")

>          (match_operand:VDX 1 "general_operand"

> -         " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]

> +       " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]

>    "TARGET_NEON

>     && (register_operand (operands[0], <MODE>mode)

>         || register_operand (operands[1], <MODE>mode))"

> @@ -47,7 +47,7 @@

>        int width, is_valid;

>        static char templ[40];

>

> -      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,

> +      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,

>          &operands[1], &width);

>

>        gcc_assert (is_valid != 0);

> @@ -94,7 +94,7 @@

>        int width, is_valid;

>        static char templ[40];

>

> -      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,

> +      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,

>          &operands[1], &width);

>

>        gcc_assert (is_valid != 0);

> @@ -147,9 +147,9 @@

>  })

>

>  (define_expand "mov<mode>"

> -  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand")

> -       (match_operand:VSTRUCT 1 "general_operand"))]

> -  "TARGET_NEON"

> +  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand" "")

> +       (match_operand:VSTRUCT 1 "general_operand" ""))]

> +  "TARGET_NEON || TARGET_HAVE_MVE"

>  {

>    gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));

>    gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));

> @@ -160,24 +160,28 @@

>      }

>  })

>

> -(define_expand "mov<mode>"

> -  [(set (match_operand:VH 0 "s_register_operand")

> -       (match_operand:VH 1 "s_register_operand"))]

> +;; The pattern mov<mode> where mode is v4hf and v8hf is split into

> +;; movv4hf and movv8hf.  The pattern movv8hf is common for MVE and

> +;; NEON, so it is moved into vec-common.md file.

> +(define_expand "movv4hf"

> +  [(set (match_operand:V4HF 0 "s_register_operand")

> +       (match_operand:V4HF 1 "s_register_operand"))]

>    "TARGET_NEON"

>  {

> -  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));

> -  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));

> +  gcc_checking_assert (aligned_operand (operands[0], E_V4HFmode));

> +  gcc_checking_assert (aligned_operand (operands[1], E_V4HFmode));

>    if (can_create_pseudo_p ())

>      {

>        if (!REG_P (operands[0]))

> -       operands[1] = force_reg (<MODE>mode, operands[1]);

> +       operands[1] = force_reg (E_V4HFmode, operands[1]);

>      }

>  })

>

> +

>  (define_insn "*neon_mov<mode>"

>    [(set (match_operand:VSTRUCT 0 "nonimmediate_operand"        "=w,Ut,w")

>          (match_operand:VSTRUCT 1 "general_operand"      " w,w, Ut"))]

> -  "TARGET_NEON

> +  "(TARGET_NEON || TARGET_HAVE_MVE)

>     && (register_operand (operands[0], <MODE>mode)

>         || register_operand (operands[1], <MODE>mode))"

>  {

> @@ -213,7 +217,7 @@

>  (define_split

>    [(set (match_operand:OI 0 "s_register_operand" "")

>          (match_operand:OI 1 "s_register_operand" ""))]

> -  "TARGET_NEON && reload_completed"

> +  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"

>    [(set (match_dup 0) (match_dup 1))

>     (set (match_dup 2) (match_dup 3))]

>  {

> @@ -254,7 +258,7 @@

>  (define_split

>    [(set (match_operand:XI 0 "s_register_operand" "")

>          (match_operand:XI 1 "s_register_operand" ""))]

> -  "TARGET_NEON && reload_completed"

> +  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"

>    [(set (match_dup 0) (match_dup 1))

>     (set (match_dup 2) (match_dup 3))

>     (set (match_dup 4) (match_dup 5))

> @@ -489,7 +493,7 @@

>  (define_expand "vec_init<mode><V_elem_l>"

>    [(match_operand:VDQ 0 "s_register_operand")

>     (match_operand 1 "" "")]

> -  "TARGET_NEON"

> +  "TARGET_NEON || TARGET_HAVE_MVE"

>  {

>    neon_expand_vector_init (operands[0], operands[1]);

>    DONE;

> diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md

> index 

> 2f0f532edf40d475e4199aa41bd7803fac8d6143..9d74165fe065b03c77918fe9e4611967799535f1 

> 100644

> --- a/gcc/config/arm/predicates.md

> +++ b/gcc/config/arm/predicates.md

> @@ -48,6 +48,16 @@

>    return guard_addr_operand (XEXP (op, 0), mode);

>  })

>

> +(define_predicate "vpr_register_operand"

> +  (match_code "reg,subreg")

> +{

> +  if (GET_CODE (op) == SUBREG)

> +    op = SUBREG_REG (op);

> +  return REG_P (op)

> +         && (REGNO (op) >= FIRST_PSEUDO_REGISTER

> +             || IS_VPR_REGNUM (REGNO (op)));

> +})

> +

>  (define_predicate "imm_for_neon_inv_logic_operand"

>    (match_code "const_vector")

>  {

> @@ -706,7 +716,7 @@

>  (define_predicate "imm_for_neon_mov_operand"

>    (match_code "const_vector,const_int")

>  {

> -  return neon_immediate_valid_for_move (op, mode, NULL, NULL);

> +  return simd_immediate_valid_for_move (op, mode, NULL, NULL);

>  })

>

>  (define_predicate "imm_for_neon_lshift_operand"

> diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm

> index 

> af60c8fc285bb536afeb9ec5c21771a4379755fc..fda5e84355b56a20eb9a22919ab1c786120cc8f1 

> 100644

> --- a/gcc/config/arm/t-arm

> +++ b/gcc/config/arm/t-arm

> @@ -55,6 +55,7 @@ MD_INCLUDES= $(srcdir)/config/arm/arm1020e.md \

>                  $(srcdir)/config/arm/ldmstm.md \

>                  $(srcdir)/config/arm/ldrdstrd.md \

>                  $(srcdir)/config/arm/marvell-f-iwmmxt.md \

> +               $(srcdir)/config/arm/mve.md \

>                  $(srcdir)/config/arm/neon.md \

>                  $(srcdir)/config/arm/predicates.md \

>                  $(srcdir)/config/arm/sync.md \

> diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md

> index 

> 60faad6597935607ed3c5593f941a04bbc924252..c99b846ab387bac633be8b1631f0e40b3c827850 

> 100644

> --- a/gcc/config/arm/types.md

> +++ b/gcc/config/arm/types.md

> @@ -550,6 +550,11 @@

>  ; The classification below is for TME instructions

>  ;

>  ; tme

> +; The classification below is for M-profile Vector Extension instructions

> +;

> +; mve_move

> +; mve_store

> +; mve_load

>

>  (define_attr "type"

>   "adc_imm,\

> @@ -1096,7 +1101,11 @@

>    crypto_sm3,\

>    crypto_sm4,\

>    coproc,\

> -  tme"

> +  tme,\

> +\

> +  mve_move,\

> +  mve_store,\

> +  mve_load"

>     (const_string "untyped"))

>

>  ; Is this an (integer side) multiply with a 32-bit (or smaller) result?

> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md

> index 

> 33ff5627284d7cc898074b562179938982ecc420..5f5c113cf95afafbb733e1bfd2a7c7b8a55651a2 

> 100644

> --- a/gcc/config/arm/vec-common.md

> +++ b/gcc/config/arm/vec-common.md

> @@ -21,8 +21,31 @@

>  ;; Vector Moves

>

>  (define_expand "mov<mode>"

> -  [(set (match_operand:VALL 0 "nonimmediate_operand")

> -       (match_operand:VALL 1 "general_operand"))]

> +  [(set (match_operand:VNIM1 0 "nonimmediate_operand")

> +       (match_operand:VNIM1 1 "general_operand"))]

> +  "TARGET_NEON

> +   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))

> +   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))

> +   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"

> +   {

> +  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));

> +  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));

> +  if (can_create_pseudo_p ())

> +    {

> +      if (!REG_P (operands[0]))

> +       operands[1] = force_reg (<MODE>mode, operands[1]);

> +      else if ((TARGET_NEON || TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)

> +              && (CONSTANT_P (operands[1])))

> +       {

> +         operands[1] = neon_make_constant (operands[1]);

> +         gcc_assert (operands[1] != NULL_RTX);

> +       }

> +    }

> +})

> +

> +(define_expand "mov<mode>"

> +  [(set (match_operand:VNINOTM1 0 "nonimmediate_operand")

> +       (match_operand:VNINOTM1 1 "general_operand"))]

>    "TARGET_NEON

>     || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"

>  {

> @@ -40,6 +63,20 @@

>      }

>  })

>

> +(define_expand "movv8hf"

> +  [(set (match_operand:V8HF 0 "s_register_operand")

> +       (match_operand:V8HF 1 "s_register_operand"))]

> +   "TARGET_NEON || TARGET_HAVE_MVE_FLOAT"

> +{

> +  gcc_checking_assert (aligned_operand (operands[0], E_V8HFmode));

> +  gcc_checking_assert (aligned_operand (operands[1], E_V8HFmode));

> +   if (can_create_pseudo_p ())

> +     {

> +       if (!REG_P (operands[0]))

> +        operands[1] = force_reg (E_V8HFmode, operands[1]);

> +     }

> +})

> +

>  ;; Vector arithmetic. Expanders are blank, then unnamed insns implement

>  ;; patterns separately for IWMMXT and Neon.

>

> diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md

> index 

> 573db164f01b4ac9ee4a9ee7414872fb93c9e2ca..6349c0570540ec25a599166f5d427fcbdbf2af68 

> 100644

> --- a/gcc/config/arm/vfp.md

> +++ b/gcc/config/arm/vfp.md

> @@ -311,7 +311,7 @@

>     && (   register_operand (operands[0], DImode)

>         || register_operand (operands[1], DImode))

>     && !(TARGET_NEON && CONST_INT_P (operands[1])

> -        && neon_immediate_valid_for_move (operands[1], DImode, NULL, 

> NULL))"

> +       && simd_immediate_valid_for_move (operands[1], DImode, NULL, 

> NULL))"

>    "*

>    switch (which_alternative)

>      {

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..c3f81546c9f14f2491c6fb134170f17bcba16069

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c

> @@ -0,0 +1,27 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +float32x4_t

> +foo32 (float32x4_t value)

> +{

> +  float32x4_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldmia.*" }  } */

> +

> +float16x8_t

> +foo16 (float16x8_t value)

> +{

> +  float16x8_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldmia.*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..ebee0d2f1ad03b66d044d93bf901e0ce78eccba9

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c

> @@ -0,0 +1,31 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +float32x4_t value;

> +

> +float32x4_t

> +foo32 ()

> +{

> +  float32x4_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldmia.*" }  } */

> +

> +float16x8_t value1;

> +

> +float16x8_t

> +foo16 ()

> +{

> +  float16x8_t b = value1;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldmia.*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..9b9c84d66ef8fd585a42be1ac7585d8bc6c529bb

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c

> @@ -0,0 +1,27 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve.fp 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +float32x4_t

> +foo32 ()

> +{

> +  float32x4_t b = {10.0, 12.0, 14.0, 16.0};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32*" }  } */

> +

> +float16x8_t

> +foo16 ()

> +{

> +  float16x8_t b = {32.01};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..6b54c3c61f32cf8e0af30272df63f261def0b8c5

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c

> @@ -0,0 +1,49 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +int8x16_t

> +foo8 (int8x16_t value)

> +{

> +  int8x16_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.s8*" }  } */

> +

> +int16x8_t

> +foo16 (int16x8_t value)

> +{

> +  int16x8_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */

> +

> +int32x4_t

> +foo32 (int32x4_t value)

> +{

> +  int32x4_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u32*" }  } */

> +

> +int64x2_t

> +foo64 (int64x2_t value)

> +{

> +  int64x2_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u64*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..748ddecbd4011bb24058c27cd6a09d66f71ce581

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c

> @@ -0,0 +1,54 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +int8x16_t value1;

> +int16x8_t value2;

> +int32x4_t value3;

> +int64x2_t value4;

> +

> +int8x16_t

> +foo8 ()

> +{

> +  int8x16_t b = value1;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u8*" }  } */

> +

> +int16x8_t

> +foo16 ()

> +{

> +  int16x8_t b = value2;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */

> +

> +int32x4_t

> +foo32 ()

> +{

> +  int32x4_t b = value3;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u32" }  } */

> +

> +int64x2_t

> +foo64 ()

> +{

> +  int64x2_t b = value4;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u64" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..376ec9ee7fc04ddde98719d2605319a378f9a6bb

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c

> @@ -0,0 +1,49 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +int8x16_t

> +foo8 ()

> +{

> +  int8x16_t b = {1, 2, 3, 4};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> +

> +int16x8_t

> +foo16 (int16x8_t value)

> +{

> +  int16x8_t b = {1, 2, 3};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> +

> +int32x4_t

> +foo32 (int32x4_t value)

> +{

> +  int32x4_t b = {1, 2};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> +

> +int64x2_t

> +foo64 (int64x2_t value)

> +{

> +  int64x2_t b = {1};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..f001d14f9ca4c851ed4b3371ae9599d23d2b62ce

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c

> @@ -0,0 +1,49 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +uint8x16_t

> +foo8 (uint8x16_t value)

> +{

> +  uint8x16_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.s8*" }  } */

> +

> +uint16x8_t

> +foo16 (uint16x8_t value)

> +{

> +  uint16x8_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */

> +

> +uint32x4_t

> +foo32 (uint32x4_t value)

> +{

> +  uint32x4_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u32*" }  } */

> +

> +uint64x2_t

> +foo64 (uint64x2_t value)

> +{

> +  uint64x2_t b = value;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u64*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..56d40668d63ba0b24c08944981c415054494c37d

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c

> @@ -0,0 +1,54 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +uint8x16_t value1;

> +uint16x8_t value2;

> +uint32x4_t value3;

> +uint64x2_t value4;

> +

> +uint8x16_t

> +foo8 ()

> +{

> +  uint8x16_t b = value1;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.s8*" }  } */

> +

> +uint16x8_t

> +foo16 ()

> +{

> +  uint16x8_t b = value2;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u16*" }  } */

> +

> +uint32x4_t

> +foo32 ()

> +{

> +  uint32x4_t b = value3;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u32*" }  } */

> +

> +uint64x2_t

> +foo64 ()

> +{

> +  uint64x2_t b = value4;

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrb.u64*" }  } */

> diff --git 

> a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c 

> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..9ff9b67993ac83cf398880cb65510604a37de6a4

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c

> @@ -0,0 +1,49 @@

> +/* { dg-do compile  } */

> +/* { dg-additional-options "-march=armv8.1-m.main+mve 

> -mfloat-abi=hard"  }  */

> +/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} 

> } */

> +

> +#include "arm_mve.h"

> +

> +uint8x16_t

> +foo8 (uint8x16_t value)

> +{

> +  uint8x16_t b = {1, 2, 3, 4};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> +

> +uint16x8_t

> +foo16 (uint16x8_t value)

> +{

> +  uint16x8_t b = {1, 2, 3};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> +

> +uint32x4_t

> +foo32 (uint32x4_t value)

> +{

> +  uint32x4_t b = {1, 2};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> +

> +uint64x2_t

> +foo64 (uint64x2_t value)

> +{

> +  uint64x2_t b = {1};

> +  return b;

> +}

> +

> +/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]" }  } */

> +/* { dg-final { scan-assembler "vstrb.*" }  } */

> +/* { dg-final { scan-assembler "vldrw.32.*" }  } */

> diff --git a/gcc/testsuite/gcc.target/arm/mve/mve.exp 

> b/gcc/testsuite/gcc.target/arm/mve/mve.exp

> new file mode 100644

> index 

> 0000000000000000000000000000000000000000..77ae3fa292b2892fb22c2f89223ca19dc16ccc99

> --- /dev/null

> +++ b/gcc/testsuite/gcc.target/arm/mve/mve.exp

> @@ -0,0 +1,47 @@

> +# Copyright (C) 2019 Free Software Foundation, Inc.

> +

> +# This program is free software; you can redistribute it and/or modify

> +# it under the terms of the GNU General Public License as published by

> +# the Free Software Foundation; either version 3 of the License, or

> +# (at your option) any later version.

> +#

> +# This program is distributed in the hope that it will be useful,

> +# but WITHOUT ANY WARRANTY; without even the implied warranty of

> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

> +# GNU General Public License for more details.

> +#

> +# You should have received a copy of the GNU General Public License

> +# along with GCC; see the file COPYING3.  If not see

> +# <http://www.gnu.org/licenses/>.

> +

> +# GCC testsuite that uses the `dg.exp' driver.

> +

> +# Exit immediately if this isn't an ARM target.

> +if ![istarget arm*-*-*] then {

> +  return

> +}

> +

> +# Load support procs.

> +load_lib gcc-dg.exp

> +

> +# If a testcase doesn't have special options, use these.

> +global DEFAULT_CFLAGS

> +if ![info exists DEFAULT_CFLAGS] then {

> +    set DEFAULT_CFLAGS " -ansi -pedantic-errors"

> +}

> +

> +# This variable should only apply to tests called in this exp file.

> +global dg_runtest_extra_prunes

> +set dg_runtest_extra_prunes ""

> +lappend dg_runtest_extra_prunes "warning: switch -m(cpu|arch)=.* 

> conflicts with -m(cpu|arch)=.* switch"

> +

> +# Initialize `dg'.

> +dg-init

> +

> +# Main loop.

> +dg-runtest [lsort [glob -nocomplain 

> $srcdir/$subdir/intrinsics/*.\[cCS\]]] \

> +       "" $DEFAULT_CFLAGS

> +

> +# All done.

> +set dg_runtest_extra_prunes ""

> +dg-finish

>

Patch

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 72f656408f11802c669c3de953bf3020020ca312..c4a7d984936c531d7dfcce347d56b5931913e68b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -344,7 +344,7 @@  arc*-*-*)
 arm*-*-*)
 	cpu_type=arm
 	extra_objs="arm-builtins.o aarch-common.o"
-	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h"
+	extra_headers="mmintrin.h arm_neon.h arm_acle.h arm_fp16.h arm_cmse.h arm_mve.h"
 	target_type_format_char='%'
 	c_target_objs="arm-c.o"
 	cxx_target_objs="arm-c.o"
diff --git a/gcc/config/arm/aout.h b/gcc/config/arm/aout.h
index 72782758853a869bcb9a9d69f3fa0da979cd711f..28cde153f704748f35c84d072b59e9695a61e661 100644
--- a/gcc/config/arm/aout.h
+++ b/gcc/config/arm/aout.h
@@ -53,7 +53,9 @@ 
 /* The assembler's names for the registers.  Note that the ?xx registers are
    there so that VFPv3/NEON registers D16-D31 have the same spacing as D0-D15
    (each of which is overlaid on two S registers), although there are no
-   actual single-precision registers which correspond to D16-D31.  */
+   actual single-precision registers which correspond to D16-D31.  New register
+   p0 is added which is used for MVE predicated cases.  */
+
 #ifndef REGISTER_NAMES
 #define REGISTER_NAMES						\
 {								\
@@ -72,7 +74,7 @@ 
   "wr8",   "wr9",   "wr10",  "wr11",				\
   "wr12",  "wr13",  "wr14",  "wr15",				\
   "wcgr0", "wcgr1", "wcgr2", "wcgr3",				\
-  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge"		\
+  "cc", "vfpcc", "sfp", "afp", "apsrq", "apsrge", "p0"		\
 }
 #endif
 
diff --git a/gcc/config/arm/arm-builtins.c b/gcc/config/arm/arm-builtins.c
index 650b22c7ad916d9abd587981e9ed5809755ee035..d4cb0ea3deb49b10266d1620c85e243ed34aee4d 100644
--- a/gcc/config/arm/arm-builtins.c
+++ b/gcc/config/arm/arm-builtins.c
@@ -667,6 +667,7 @@  enum arm_builtins
   ARM_BUILTIN_SET_FPSCR,
 
   ARM_BUILTIN_CMSE_NONSECURE_CALLER,
+  ARM_BUILTIN_SIMD_LANE_CHECK,
 
 #undef CRYPTO1
 #undef CRYPTO2
@@ -692,7 +693,6 @@  enum arm_builtins
 #include "arm_vfp_builtins.def"
 
   ARM_BUILTIN_NEON_BASE,
-  ARM_BUILTIN_NEON_LANE_CHECK = ARM_BUILTIN_NEON_BASE,
 
 #include "arm_neon_builtins.def"
 
@@ -948,26 +948,35 @@  arm_init_simd_builtin_types (void)
      an entry in our mangling table, consequently, they get default
      mangling.  As a further gotcha, poly8_t and poly16_t are signed
      types, poly64_t and poly128_t are unsigned types.  */
-  arm_simd_polyQI_type_node
-    = build_distinct_type_copy (intQI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
-					     "__builtin_neon_poly8");
-  arm_simd_polyHI_type_node
-    = build_distinct_type_copy (intHI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
-					     "__builtin_neon_poly16");
-  arm_simd_polyDI_type_node
-    = build_distinct_type_copy (unsigned_intDI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
-					     "__builtin_neon_poly64");
-  arm_simd_polyTI_type_node
-    = build_distinct_type_copy (unsigned_intTI_type_node);
-  (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
-					     "__builtin_neon_poly128");
-  /* Prevent front-ends from transforming poly vectors into string
-     literals.  */
-  TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
-  TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
+  if (!TARGET_HAVE_MVE)
+    {
+      arm_simd_polyQI_type_node
+	= build_distinct_type_copy (intQI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyQI_type_node,
+						 "__builtin_neon_poly8");
+      arm_simd_polyHI_type_node
+	= build_distinct_type_copy (intHI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyHI_type_node,
+						 "__builtin_neon_poly16");
+      arm_simd_polyDI_type_node
+	= build_distinct_type_copy (unsigned_intDI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyDI_type_node,
+						 "__builtin_neon_poly64");
+      arm_simd_polyTI_type_node
+	= build_distinct_type_copy (unsigned_intTI_type_node);
+      (*lang_hooks.types.register_builtin_type) (arm_simd_polyTI_type_node,
+						 "__builtin_neon_poly128");
+      /* Init poly vector element types with scalar poly types.  */
+      arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
+      arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
+      arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
+      arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
+
+      /* Prevent front-ends from transforming poly vectors into string
+	 literals.  */
+      TYPE_STRING_FLAG (arm_simd_polyQI_type_node) = false;
+      TYPE_STRING_FLAG (arm_simd_polyHI_type_node) = false;
+    }
 
   /* Init all the element types built by the front-end.  */
   arm_simd_types[Int8x8_t].eltype = intQI_type_node;
@@ -985,11 +994,6 @@  arm_init_simd_builtin_types (void)
   arm_simd_types[Uint32x4_t].eltype = unsigned_intSI_type_node;
   arm_simd_types[Uint64x2_t].eltype = unsigned_intDI_type_node;
 
-  /* Init poly vector element types with scalar poly types.  */
-  arm_simd_types[Poly8x8_t].eltype = arm_simd_polyQI_type_node;
-  arm_simd_types[Poly8x16_t].eltype = arm_simd_polyQI_type_node;
-  arm_simd_types[Poly16x4_t].eltype = arm_simd_polyHI_type_node;
-  arm_simd_types[Poly16x8_t].eltype = arm_simd_polyHI_type_node;
   /* Note: poly64x2_t is defined in arm_neon.h, to ensure it gets default
      mangling.  */
 
@@ -1006,6 +1010,8 @@  arm_init_simd_builtin_types (void)
       tree eltype = arm_simd_types[i].eltype;
       machine_mode mode = arm_simd_types[i].mode;
 
+      if (eltype == NULL)
+	continue;
       if (arm_simd_types[i].itype == NULL)
 	arm_simd_types[i].itype =
 	  build_distinct_type_copy
@@ -1231,15 +1237,6 @@  arm_init_neon_builtins (void)
      system.  */
   arm_init_simd_builtin_scalar_types ();
 
-  tree lane_check_fpr = build_function_type_list (void_type_node,
-						  intSI_type_node,
-						  intSI_type_node,
-						  NULL);
-  arm_builtin_decls[ARM_BUILTIN_NEON_LANE_CHECK] =
-      add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
-			    ARM_BUILTIN_NEON_LANE_CHECK, BUILT_IN_MD,
-			    NULL, NULL_TREE);
-
   for (i = 0; i < ARRAY_SIZE (neon_builtin_data); i++, fcode++)
     {
       arm_builtin_datum *d = &neon_builtin_data[i];
@@ -1956,6 +1953,15 @@  arm_init_builtins (void)
 
   if (TARGET_MAYBE_HARD_FLOAT)
     {
+      tree lane_check_fpr = build_function_type_list (void_type_node,
+						      intSI_type_node,
+						      intSI_type_node,
+						      NULL);
+      arm_builtin_decls[ARM_BUILTIN_SIMD_LANE_CHECK]
+      = add_builtin_function ("__builtin_arm_lane_check", lane_check_fpr,
+			      ARM_BUILTIN_SIMD_LANE_CHECK, BUILT_IN_MD,
+			      NULL, NULL_TREE);
+
       arm_init_neon_builtins ();
       arm_init_vfp_builtins ();
       arm_init_crypto_builtins ();
@@ -2201,6 +2207,47 @@  neon_dereference_pointer (tree exp, tree type, machine_mode mem_mode,
 		      build_int_cst (build_pointer_type (array_type), 0));
 }
 
+/* EXP is a pointer argument to a vector scatter store intrinsics.
+
+   Consider the following example:
+	VSTRW<v>.<dt> Qd, [Qm{, #+/-<imm>}]!
+   When <Qm> used as the base register for the target address,
+   this function is used to derive and return an expression for the
+   accessed memory.
+
+   The intrinsic function operates on a block of registers that has mode
+   REG_MODE.  This block contains vectors of type TYPE_MODE.  The function
+   references the memory at EXP of type TYPE and in mode MEM_MODE.  This
+   mode may be BLKmode if no more suitable mode is available.  */
+
+static tree
+mve_dereference_pointer (tree exp, tree type, machine_mode reg_mode,
+			 machine_mode vector_mode)
+{
+  HOST_WIDE_INT reg_size, vector_size, nelems;
+  tree elem_type, upper_bound, array_type;
+
+  /* Work out the size of each vector in bytes.  */
+  vector_size = GET_MODE_SIZE (vector_mode);
+
+  /* Work out the size of the register block in bytes.  */
+  reg_size = GET_MODE_SIZE (reg_mode);
+
+  /* Work out the type of each element.  */
+  gcc_assert (POINTER_TYPE_P (type));
+  elem_type = TREE_TYPE (type);
+
+  nelems = reg_size / vector_size;
+
+  /* Create a type that describes the full access.  */
+  upper_bound = build_int_cst (size_type_node, nelems - 1);
+  array_type = build_array_type (elem_type, build_index_type (upper_bound));
+
+  /* Dereference EXP using that type.  */
+  return fold_build2 (MEM_REF, array_type, exp,
+		      build_int_cst (build_pointer_type (array_type), 0));
+}
+
 /* Expand a builtin.  */
 static rtx
 arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
@@ -2239,10 +2286,17 @@  arm_expand_builtin_args (rtx target, machine_mode map_mode, int fcode,
             {
               machine_mode other_mode
 		= insn_data[icode].operand[1 - opno].mode;
-              arg[argc] = neon_dereference_pointer (arg[argc],
+	      if (TARGET_HAVE_MVE && mode[argc] != other_mode)
+		{
+		  arg[argc] = mve_dereference_pointer (arg[argc],
 						    TREE_VALUE (formals),
-						    mode[argc], other_mode,
-						    map_mode);
+						    other_mode, map_mode);
+		}
+	      else
+		arg[argc] = neon_dereference_pointer (arg[argc],
+						      TREE_VALUE (formals),
+						      mode[argc], other_mode,
+						      map_mode);
             }
 
 	  /* Use EXPAND_MEMORY for ARG_BUILTIN_MEMORY and
@@ -2548,22 +2602,6 @@  arm_expand_neon_builtin (int fcode, tree exp, rtx target)
       return const0_rtx;
     }
 
-  if (fcode == ARM_BUILTIN_NEON_LANE_CHECK)
-    {
-      /* Builtin is only to check bounds of the lane passed to some intrinsics
-	 that are implemented with gcc vector extensions in arm_neon.h.  */
-
-      tree nlanes = CALL_EXPR_ARG (exp, 0);
-      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
-      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
-      if (CONST_INT_P (lane_idx))
-	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
-      else
-	error ("%Klane index must be a constant immediate", exp);
-      /* Don't generate any RTL.  */
-      return const0_rtx;
-    }
-
   arm_builtin_datum *d
     = &neon_builtin_data[fcode - ARM_BUILTIN_NEON_PATTERN_START];
 
@@ -2625,6 +2663,22 @@  arm_expand_builtin (tree exp,
   int mask;
   int imm;
 
+  if (fcode == ARM_BUILTIN_SIMD_LANE_CHECK)
+    {
+      /* Builtin is only to check bounds of the lane passed to some intrinsics
+	 that are implemented with gcc vector extensions in arm_neon.h.  */
+
+      tree nlanes = CALL_EXPR_ARG (exp, 0);
+      gcc_assert (TREE_CODE (nlanes) == INTEGER_CST);
+      rtx lane_idx = expand_normal (CALL_EXPR_ARG (exp, 1));
+      if (CONST_INT_P (lane_idx))
+	neon_lane_bounds (lane_idx, 0, TREE_INT_CST_LOW (nlanes), exp);
+      else
+	error ("%Klane index must be a constant immediate", exp);
+      /* Don't generate any RTL.  */
+      return const0_rtx;
+    }
+
   if (fcode >= ARM_BUILTIN_ACLE_BASE)
     return arm_expand_acle_builtin (fcode, exp, target);
 
diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 34695fa0112e90e4bdf317da0b9fd1d3194bf0a2..0fe7d371c348818f25901c5d84be94589523c9a6 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -79,6 +79,16 @@  arm_cpu_builtins (struct cpp_reader* pfile)
   def_or_undef_macro (pfile, "__ARM_FEATURE_COMPLEX", TARGET_COMPLEX);
   def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
 
+  cpp_undef (pfile, "__ARM_FEATURE_MVE");
+  if (TARGET_HAVE_MVE && TARGET_HAVE_MVE_FLOAT)
+    {
+      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 3);
+    }
+  else if (TARGET_HAVE_MVE)
+    {
+      builtin_define_with_int_value ("__ARM_FEATURE_MVE", 1);
+    }
+
   cpp_undef (pfile, "__ARM_FEATURE_CMSE");
   if (arm_arch8 && !arm_arch_notm)
     {
diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h
index 5b49049cc45c0bccfa9d67eac0940250fc5dd95a..d4612ae4553697989611d772f7bb0061a04b98b6 100644
--- a/gcc/config/arm/arm-protos.h
+++ b/gcc/config/arm/arm-protos.h
@@ -85,7 +85,7 @@  extern bool ldm_stm_operation_p (rtx, bool, machine_mode mode,
 extern bool clear_operation_p (rtx, bool);
 extern int arm_const_double_rtx (rtx);
 extern int vfp3_const_double_rtx (rtx);
-extern int neon_immediate_valid_for_move (rtx, machine_mode, rtx *, int *);
+extern int simd_immediate_valid_for_move (rtx, machine_mode, rtx *, int *);
 extern int neon_immediate_valid_for_logic (rtx, machine_mode, int, rtx *,
 					   int *);
 extern int neon_immediate_valid_for_shift (rtx, machine_mode, rtx *,
diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 8b07c423fb6b071642fccc48424fe244d97dcbc2..c755df420b52798773ee99f54faf6689d4a16215 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -751,7 +751,8 @@  extern int arm_arch_cmse;
 /*	s0-s15		VFP scratch (aka d0-d7).
 	s16-s31	      S	VFP variable (aka d8-d15).
 	vfpcc		Not a real register.  Represents the VFP condition
-			code flags.  */
+			code flags.
+	vpr		Used to represent MVE VPR predication.  */
 
 /* The stack backtrace structure is as follows:
   fp points to here:  |  save code pointer  |      [fp]
@@ -792,7 +793,7 @@  extern int arm_arch_cmse;
   1,1,1,1,1,1,1,1,		\
   1,1,1,1,			\
   /* Specials.  */		\
-  1,1,1,1,1,1			\
+  1,1,1,1,1,1,1			\
 }
 
 /* 1 for registers not available across function calls.
@@ -822,7 +823,7 @@  extern int arm_arch_cmse;
   1,1,1,1,1,1,1,1,		\
   1,1,1,1,			\
   /* Specials.  */		\
-  1,1,1,1,1,1			\
+  1,1,1,1,1,1,1			\
 }
 
 #ifndef SUBTARGET_CONDITIONAL_REGISTER_USAGE
@@ -998,10 +999,10 @@  extern int arm_arch_cmse;
    && (LAST_VFP_REGNUM - (REGNUM) >= 2 * (N) - 1))
 
 /* The number of hard registers is 16 ARM + 1 CC + 1 SFP + 1 AFP
-   + 1 APSRQ + 1 APSRGE.  */
+   + 1 APSRQ + 1 APSRGE + 1 VPR.  */
 /* Intel Wireless MMX Technology registers add 16 + 4 more.  */
 /* VFP (VFP3) adds 32 (64) + 1 VFPCC.  */
-#define FIRST_PSEUDO_REGISTER   106
+#define FIRST_PSEUDO_REGISTER   107
 
 #define DBX_REGISTER_NUMBER(REGNO) arm_dbx_register_number (REGNO)
 
@@ -1029,11 +1030,26 @@  extern int arm_arch_cmse;
   ((MODE) == V4SImode || (MODE) == V8HImode || (MODE) == V16QImode \
    || (MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DImode)
 
+#define VALID_MVE_MODE(MODE) \
+  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
+   || (MODE) == V16QImode || (MODE) == V8HFmode || (MODE) == V4SFmode \
+   || (MODE) == V2DFmode)
+
+#define VALID_MVE_SI_MODE(MODE) \
+  ((MODE) == V2DImode ||(MODE) == V4SImode || (MODE) == V8HImode \
+   || (MODE) == V16QImode)
+
+#define VALID_MVE_SF_MODE(MODE) \
+  ((MODE) == V8HFmode || (MODE) == V4SFmode || (MODE) == V2DFmode)
+
 /* Structure modes valid for Neon registers.  */
 #define VALID_NEON_STRUCT_MODE(MODE) \
   ((MODE) == TImode || (MODE) == EImode || (MODE) == OImode \
    || (MODE) == CImode || (MODE) == XImode)
 
+#define VALID_MVE_STRUCT_MODE(MODE) \
+  ((MODE) == TImode || (MODE) == OImode || (MODE) == XImode)
+
 /* The register numbers in sequence, for passing to arm_gen_load_multiple.  */
 extern int arm_regs_in_sequence[];
 
@@ -1085,9 +1101,13 @@  extern int arm_regs_in_sequence[];
   /* Registers not for general use.  */		\
   CC_REGNUM, VFPCC_REGNUM,			\
   FRAME_POINTER_REGNUM, ARG_POINTER_REGNUM,	\
-  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM	\
+  SP_REGNUM, PC_REGNUM, APSRQ_REGNUM, APSRGE_REGNUM,	\
+  VPR_REGNUM					\
 }
 
+#define IS_VPR_REGNUM(REGNUM) \
+  ((REGNUM) == VPR_REGNUM)
+
 /* Use different register alloc ordering for Thumb.  */
 #define ADJUST_REG_ALLOC_ORDER arm_order_regs_for_local_alloc ()
 
@@ -1124,6 +1144,7 @@  enum reg_class
   VFPCC_REG,
   SFP_REG,
   AFP_REG,
+  VPR_REG,
   ALL_REGS,
   LIM_REG_CLASSES
 };
@@ -1131,7 +1152,7 @@  enum reg_class
 #define N_REG_CLASSES  (int) LIM_REG_CLASSES
 
 /* Give names of register classes as strings for dump file.  */
-#define REG_CLASS_NAMES  \
+#define REG_CLASS_NAMES \
 {			\
   "NO_REGS",		\
   "LO_REGS",		\
@@ -1151,6 +1172,7 @@  enum reg_class
   "VFPCC_REG",		\
   "SFP_REG",		\
   "AFP_REG",		\
+  "VPR_REG",		\
   "ALL_REGS"		\
 }
 
@@ -1177,7 +1199,8 @@  enum reg_class
   { 0x00000000, 0x00000000, 0x00000000, 0x00000020 }, /* VFPCC_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000040 }, /* SFP_REG */	\
   { 0x00000000, 0x00000000, 0x00000000, 0x00000080 }, /* AFP_REG */	\
-  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000000F }  /* ALL_REGS */	\
+  { 0x00000000, 0x00000000, 0x00000000, 0x00000100 }, /* VPR_REG.  */	\
+  { 0xFFFF7FFF, 0xFFFFFFFF, 0xFFFFFFFF, 0x0000010F }  /* ALL_REGS.  */	\
 }
 
 #define FP_SYSREGS \
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 883c2a9179d7e6d69225f8d104228d15702ecef7..6faed76206b93c1a9dea048e2f693dc16ee58072 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -3759,7 +3759,8 @@  arm_options_perform_arch_sanity_checks (void)
       else if (TARGET_HARD_FLOAT_ABI)
 	{
 	  arm_pcs_default = ARM_PCS_AAPCS_VFP;
-	  if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))
+	  if (!bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2)
+	      && !bitmap_bit_p (arm_active_target.isa, isa_bit_mve))
 	    error ("%<-mfloat-abi=hard%>: selected processor lacks an FPU");
 	}
       else
@@ -4230,7 +4231,7 @@  use_return_insn (int iscond, rtx sibling)
 
   /* Can't be done if any of the VFP regs are pushed,
      since this also requires an insn.  */
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     for (regno = FIRST_VFP_REGNUM; regno <= LAST_VFP_REGNUM; regno++)
       if (df_regs_ever_live_p (regno) && !call_used_or_fixed_reg_p (regno))
 	return 0;
@@ -6289,7 +6290,7 @@  aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, machine_mode mode,
       {
 	pcum->aapcs_vfp_reg_alloc = mask << regno;
 	if (mode == BLKmode
-	    || (mode == TImode && ! TARGET_NEON)
+	    || (mode == TImode && ! (TARGET_NEON || TARGET_HAVE_MVE))
 	    || ! arm_hard_regno_mode_ok (FIRST_VFP_REGNUM + regno, mode))
 	  {
 	    int i;
@@ -6297,7 +6298,7 @@  aapcs_vfp_allocate (CUMULATIVE_ARGS *pcum, machine_mode mode,
 	    int rshift = shift;
 	    machine_mode rmode = pcum->aapcs_vfp_rmode;
 	    rtx par;
-	    if (!TARGET_NEON)
+	    if (!(TARGET_NEON || TARGET_HAVE_MVE))
 	      {
 		/* Avoid using unsupported vector modes.  */
 		if (rmode == V2SImode)
@@ -6343,7 +6344,7 @@  aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
   if (mode == BLKmode
       || (GET_MODE_CLASS (mode) == MODE_INT
 	  && GET_MODE_SIZE (mode) >= GET_MODE_SIZE (TImode)
-	  && !TARGET_NEON))
+	  && !(TARGET_NEON || TARGET_HAVE_MVE)))
     {
       int count;
       machine_mode ag_mode;
@@ -6354,7 +6355,7 @@  aapcs_vfp_allocate_return_reg (enum arm_pcs pcs_variant ATTRIBUTE_UNUSED,
       aapcs_vfp_is_call_or_return_candidate (pcs_variant, mode, type,
 					     &ag_mode, &count);
 
-      if (!TARGET_NEON)
+      if (!(TARGET_NEON || TARGET_HAVE_MVE))
 	{
 	  if (ag_mode == V2SImode)
 	    ag_mode = DImode;
@@ -8253,7 +8254,9 @@  thumb2_legitimate_address_p (machine_mode mode, rtx x, int strict_p)
 		   && CONST_INT_P (XEXP (XEXP (x, 0), 1)))))
     return 1;
 
-  else if (mode == TImode || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode)))
+  else if (mode == TImode
+	   || (TARGET_NEON && VALID_NEON_STRUCT_MODE (mode))
+	   || (TARGET_HAVE_MVE && VALID_MVE_STRUCT_MODE (mode)))
     return 0;
 
   else if (code == PLUS)
@@ -9800,7 +9803,7 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 	  /* Assume that most copies can be done with a single insn,
 	     unless we don't have HW FP, in which case everything
 	     larger than word mode will require two insns.  */
-	  *cost = COSTS_N_INSNS (((!TARGET_HARD_FLOAT
+	  *cost = COSTS_N_INSNS (((!(TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
 				   && GET_MODE_SIZE (mode) > 4)
 				  || mode == DImode)
 				 ? 2 : 1);
@@ -11281,10 +11284,10 @@  arm_rtx_costs_internal (rtx x, enum rtx_code code, enum rtx_code outer_code,
 
     case CONST_VECTOR:
       /* Fixme.  */
-      if (TARGET_NEON
-	  && TARGET_HARD_FLOAT
-	  && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode))
-	  && neon_immediate_valid_for_move (x, mode, NULL, NULL))
+      if (((TARGET_NEON && TARGET_HARD_FLOAT
+	    && (VALID_NEON_DREG_MODE (mode) || VALID_NEON_QREG_MODE (mode)))
+	   || TARGET_HAVE_MVE)
+	  && simd_immediate_valid_for_move (x, mode, NULL, NULL))
 	*cost = COSTS_N_INSNS (1);
       else
 	*cost = COSTS_N_INSNS (4);
@@ -12328,8 +12331,8 @@  vfp3_const_double_rtx (rtx x)
   return vfp3_const_double_index (x) != -1;
 }
 
-/* Recognize immediates which can be used in various Neon instructions. Legal
-   immediates are described by the following table (for VMVN variants, the
+/* Recognize immediates which can be used in various Neon and MVE instructions.
+   Legal immediates are described by the following table (for VMVN variants, the
    bitwise inverse of the constant shown is recognized. In either case, VMOV
    is output and the correct instruction to use for a given constant is chosen
    by the assembler). The constant shown is replicated across all elements of
@@ -12380,7 +12383,7 @@  vfp3_const_double_rtx (rtx x)
    -1 if the given value doesn't match any of the listed patterns.
 */
 static int
-neon_valid_immediate (rtx op, machine_mode mode, int inverse,
+simd_valid_immediate (rtx op, machine_mode mode, int inverse,
 		      rtx *modconst, int *elementwidth)
 {
 #define CHECK(STRIDE, ELSIZE, CLASS, TEST)	\
@@ -12412,6 +12415,10 @@  neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 
   innersize = GET_MODE_UNIT_SIZE (mode);
 
+  /* Only support 128-bit vectors for MVE.  */
+  if (TARGET_HAVE_MVE && (!vector || n_elts * innersize != 16))
+    return -1;
+
   /* Vectors of float constants.  */
   if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT)
     {
@@ -12560,18 +12567,19 @@  neon_valid_immediate (rtx op, machine_mode mode, int inverse,
 #undef CHECK
 }
 
-/* Return TRUE if rtx X is legal for use as either a Neon VMOV (or, implicitly,
-   VMVN) immediate. Write back width per element to *ELEMENTWIDTH (or zero for
-   float elements), and a modified constant (whatever should be output for a
-   VMOV) in *MODCONST.  */
-
+/* Return TRUE if rtx X is legal for use as either a Neon or MVE VMOV (or,
+   implicitly, VMVN) immediate.  Write back width per element to *ELEMENTWIDTH
+   (or zero for float elements), and a modified constant (whatever should be
+   output for a VMOV) in *MODCONST.  "neon_immediate_valid_for_move" function is
+   modified to "simd_immediate_valid_for_move" as this function will be used
+   both by neon and mve.  */
 int
-neon_immediate_valid_for_move (rtx op, machine_mode mode,
+simd_immediate_valid_for_move (rtx op, machine_mode mode,
 			       rtx *modconst, int *elementwidth)
 {
   rtx tmpconst;
   int tmpwidth;
-  int retval = neon_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
+  int retval = simd_valid_immediate (op, mode, 0, &tmpconst, &tmpwidth);
 
   if (retval == -1)
     return 0;
@@ -12588,7 +12596,7 @@  neon_immediate_valid_for_move (rtx op, machine_mode mode,
 /* Return TRUE if rtx X is legal for use in a VORR or VBIC instruction.  If
    the immediate is valid, write a constant suitable for using as an operand
    to VORR/VBIC/VAND/VORN to *MODCONST and the corresponding element width to
-   *ELEMENTWIDTH. See neon_valid_immediate for description of INVERSE.  */
+   *ELEMENTWIDTH.  See simd_valid_immediate for description of INVERSE.  */
 
 int
 neon_immediate_valid_for_logic (rtx op, machine_mode mode, int inverse,
@@ -12596,7 +12604,7 @@  neon_immediate_valid_for_logic (rtx op, machine_mode mode, int inverse,
 {
   rtx tmpconst;
   int tmpwidth;
-  int retval = neon_valid_immediate (op, mode, inverse, &tmpconst, &tmpwidth);
+  int retval = simd_valid_immediate (op, mode, inverse, &tmpconst, &tmpwidth);
 
   if (retval < 0 || retval > 5)
     return 0;
@@ -12803,7 +12811,7 @@  neon_make_constant (rtx vals)
     gcc_unreachable ();
 
   if (const_vec != NULL
-      && neon_immediate_valid_for_move (const_vec, mode, NULL, NULL))
+      && simd_immediate_valid_for_move (const_vec, mode, NULL, NULL))
     /* Load using VMOV.  On Cortex-A8 this takes one cycle.  */
     return const_vec;
   else if ((target = neon_vdup_constant (vals)) != NULL_RTX)
@@ -13080,6 +13088,15 @@  neon_vector_mem_operand (rtx op, int type, bool strict)
       && (INTVAL (XEXP (ind, 1)) & 3) == 0)
     return TRUE;
 
+  if (type == 1 && TARGET_HAVE_MVE
+      && (GET_CODE (ind) == POST_INC || GET_CODE (ind) == PRE_DEC))
+    {
+      rtx ind1 = XEXP (ind, 0);
+      if (!REG_P (ind1))
+	return 0;
+      return NEON_REGNO_OK_FOR_QUAD (REGNO (ind1));
+    }
+
   return FALSE;
 }
 
@@ -19936,7 +19953,7 @@  output_move_neon (rtx *operands)
     {
     case POST_INC:
       /* We have to use vldm / vstm for too-large modes.  */
-      if (nregs > 4)
+      if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))
 	{
 	  templ = "v%smia%%?\t%%0!, %%h1";
 	  ops[0] = XEXP (addr, 0);
@@ -19965,7 +19982,7 @@  output_move_neon (rtx *operands)
       /* We have to use vldm / vstm for too-large modes.  */
       if (nregs > 1)
 	{
-	  if (nregs > 4)
+	  if (nregs > 4 || (TARGET_HAVE_MVE && nregs >= 2))
 	    templ = "v%smia%%?\t%%m0, %%h1";
 	  else
 	    templ = "v%s1.64\t%%h1, %%A0";
@@ -19980,29 +19997,40 @@  output_move_neon (rtx *operands)
       {
 	int i;
 	int overlap = -1;
-	for (i = 0; i < nregs; i++)
+	if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN)
 	  {
-	    /* We're only using DImode here because it's a convenient size.  */
-	    ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
-	    ops[1] = adjust_address (mem, DImode, 8 * i);
-	    if (reg_overlap_mentioned_p (ops[0], mem))
+	    sprintf (buff, "v%srw.32\t%%q0, %%1", load ? "ld" : "st");
+	    ops[0] = reg;
+	    ops[1] = mem;
+	    output_asm_insn (buff, ops);
+	  }
+	else
+	  {
+	    for (i = 0; i < nregs; i++)
 	      {
-		gcc_assert (overlap == -1);
-		overlap = i;
+		/* We're only using DImode here because it's a convenient
+		   size.  */
+		ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * i);
+		ops[1] = adjust_address (mem, DImode, 8 * i);
+		if (reg_overlap_mentioned_p (ops[0], mem))
+		  {
+		    gcc_assert (overlap == -1);
+		    overlap = i;
+		  }
+		else
+		  {
+		    sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
+		    output_asm_insn (buff, ops);
+		  }
 	      }
-	    else
+	    if (overlap != -1)
 	      {
+		ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
+		ops[1] = adjust_address (mem, SImode, 8 * overlap);
 		sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
 		output_asm_insn (buff, ops);
 	      }
 	  }
-	if (overlap != -1)
-	  {
-	    ops[0] = gen_rtx_REG (DImode, REGNO (reg) + 2 * overlap);
-	    ops[1] = adjust_address (mem, SImode, 8 * overlap);
-	    sprintf (buff, "v%sr%%?\t%%P0, %%1", load ? "ld" : "st");
-	    output_asm_insn (buff, ops);
-	  }
 
         return "";
       }
@@ -22223,7 +22251,7 @@  arm_compute_frame_layout (void)
       func_type = arm_current_func_type ();
       /* Space for saved VFP registers.  */
       if (! IS_VOLATILE (func_type)
-	  && TARGET_HARD_FLOAT)
+	  && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))
 	saved += arm_get_vfp_saved_size ();
 
       /* Allocate space for saving/restoring FPCXTNS in Armv8.1-M Mainline
@@ -22447,7 +22475,7 @@  arm_save_coproc_regs(void)
 	saved_size += 8;
       }
 
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     {
       start_reg = FIRST_VFP_REGNUM;
 
@@ -23749,6 +23777,53 @@  arm_print_operand (FILE *stream, rtx x, int code)
       }
       return;
 
+    /* To print the memory operand with "Us" constraint.  Based on the rtx_code
+       the memory operands output looks like following.
+       1. [Rn], #+/-<imm>
+       2. [Rn, #+/-<imm>]!
+       3. [Rn].  */
+    case 'E':
+      {
+	rtx addr;
+	rtx postinc_reg = NULL;
+	unsigned inc_val = 0;
+	enum rtx_code code;
+
+	gcc_assert (MEM_P (x));
+	addr = XEXP (x, 0);
+	code = GET_CODE (addr);
+	if (code == POST_INC || code == POST_DEC || code == PRE_INC
+	    || code  == PRE_DEC)
+	  {
+	    asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
+	    inc_val = GET_MODE_SIZE (GET_MODE (x));
+	    if (code == POST_INC || code == POST_DEC)
+	      asm_fprintf (stream, "], #%s%d",(code == POST_INC)
+					      ? "": "-", inc_val);
+	    else
+	      asm_fprintf (stream, ", #%s%d]!",(code == PRE_INC)
+					       ? "": "-", inc_val);
+	  }
+	else if (code == POST_MODIFY || code == PRE_MODIFY)
+	  {
+	    asm_fprintf (stream, "[%r", REGNO (XEXP (addr, 0)));
+	    postinc_reg = XEXP ( XEXP (x, 1), 1);
+	    if (postinc_reg && CONST_INT_P (postinc_reg))
+	      {
+		if (code == POST_MODIFY)
+		  asm_fprintf (stream, "], #%wd",INTVAL (postinc_reg));
+		else
+		  asm_fprintf (stream, ", #%wd]!",INTVAL (postinc_reg));
+	      }
+	  }
+	else
+	  {
+	    gcc_assert (REG_P (addr));
+	    asm_fprintf (stream, "[%r]",REGNO (addr));
+	  }
+      }
+      return;
+
     case 'C':
       {
 	rtx addr;
@@ -23926,9 +24001,10 @@  arm_print_operand_address (FILE *stream, machine_mode mode, rtx x)
 			 REGNO (XEXP (x, 0)),
 			 GET_CODE (x) == PRE_DEC ? "-" : "",
 			 GET_MODE_SIZE (mode));
+	  else if (TARGET_HAVE_MVE && (mode == OImode || mode == XImode))
+	    asm_fprintf (stream, "[%r]!", REGNO (XEXP (x,0)));
 	  else
-	    asm_fprintf (stream, "[%r], #%s%d",
-			 REGNO (XEXP (x, 0)),
+	    asm_fprintf (stream, "[%r], #%s%d", REGNO (XEXP (x, 0)),
 			 GET_CODE (x) == POST_DEC ? "-" : "",
 			 GET_MODE_SIZE (mode));
 	}
@@ -24773,12 +24849,15 @@  arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 {
   if (GET_MODE_CLASS (mode) == MODE_CC)
     return (regno == CC_REGNUM
-	    || (TARGET_HARD_FLOAT
+	    || ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
 		&& regno == VFPCC_REGNUM));
 
   if (regno == CC_REGNUM && GET_MODE_CLASS (mode) != MODE_CC)
     return false;
 
+  if (IS_VPR_REGNUM (regno))
+    return true;
+
   if (TARGET_THUMB1)
     /* For the Thumb we only allow values bigger than SImode in
        registers 0 - 6, so that there is always a second low
@@ -24787,7 +24866,7 @@  arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
        start of an even numbered register pair.  */
     return (ARM_NUM_REGS (mode) < 2) || (regno < LAST_LO_REGNUM);
 
-  if (TARGET_HARD_FLOAT && IS_VFP_REGNUM (regno))
+  if ((TARGET_HARD_FLOAT || TARGET_HAVE_MVE) && IS_VFP_REGNUM (regno))
     {
       if (mode == SFmode || mode == SImode)
 	return VFP_REGNO_OK_FOR_SINGLE (regno);
@@ -24811,6 +24890,10 @@  arm_hard_regno_mode_ok (unsigned int regno, machine_mode mode)
 	       || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
 	       || (mode == CImode && NEON_REGNO_OK_FOR_NREGS (regno, 6))
 	       || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8));
+     if (TARGET_HAVE_MVE)
+       return ((VALID_MVE_MODE (mode) && NEON_REGNO_OK_FOR_QUAD (regno))
+	       || (mode == OImode && NEON_REGNO_OK_FOR_NREGS (regno, 4))
+	       || (mode == XImode && NEON_REGNO_OK_FOR_NREGS (regno, 8)));
 
       return false;
     }
@@ -24859,13 +24942,18 @@  arm_modes_tieable_p (machine_mode mode1, machine_mode mode2)
   /* We specifically want to allow elements of "structure" modes to
      be tieable to the structure.  This more general condition allows
      other rarer situations too.  */
-  if (TARGET_NEON
-      && (VALID_NEON_DREG_MODE (mode1)
-	  || VALID_NEON_QREG_MODE (mode1)
-	  || VALID_NEON_STRUCT_MODE (mode1))
-      && (VALID_NEON_DREG_MODE (mode2)
-	  || VALID_NEON_QREG_MODE (mode2)
-	  || VALID_NEON_STRUCT_MODE (mode2)))
+  if ((TARGET_NEON
+       && (VALID_NEON_DREG_MODE (mode1)
+	   || VALID_NEON_QREG_MODE (mode1)
+	   || VALID_NEON_STRUCT_MODE (mode1))
+       && (VALID_NEON_DREG_MODE (mode2)
+	   || VALID_NEON_QREG_MODE (mode2)
+	   || VALID_NEON_STRUCT_MODE (mode2)))
+      || (TARGET_HAVE_MVE
+	  && (VALID_MVE_MODE (mode1)
+	      || VALID_MVE_STRUCT_MODE (mode1))
+	  && (VALID_MVE_MODE (mode2)
+	      || VALID_MVE_STRUCT_MODE (mode2))))
     return true;
 
   return false;
@@ -24880,6 +24968,9 @@  arm_regno_class (int regno)
   if (regno == PC_REGNUM)
     return NO_REGS;
 
+  if (IS_VPR_REGNUM (regno))
+    return VPR_REG;
+
   if (TARGET_THUMB1)
     {
       if (regno == STACK_POINTER_REGNUM)
@@ -26731,7 +26822,7 @@  arm_expand_epilogue_apcs_frame (bool really_return)
         floats_from_frame += 4;
       }
 
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     {
       int start_reg;
       rtx ip_rtx = gen_rtx_REG (SImode, IP_REGNUM);
@@ -26977,7 +27068,7 @@  arm_expand_epilogue (bool really_return)
         }
     }
 
-  if (TARGET_HARD_FLOAT)
+  if (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)
     {
       /* Generate VFP register multi-pop.  */
       int end_reg = LAST_VFP_REGNUM + 1;
@@ -27148,7 +27239,7 @@  arm_expand_epilogue (bool really_return)
 						   GEN_INT (FPCXTNS_ENUM)));
 	  RTX_FRAME_RELATED_P (insn) = 1;
 	}
-      }
+    }
 
   if (!really_return)
     return;
@@ -28370,6 +28461,15 @@  arm_vector_mode_supported_p (machine_mode mode)
       || mode == V2HAmode))
     return true;
 
+  if (TARGET_HAVE_MVE
+      && (mode == V2DImode || mode == V4SImode || mode == V8HImode
+	  || mode == V16QImode))
+      return true;
+
+  if (TARGET_HAVE_MVE_FLOAT
+      && (mode == V2DFmode || mode == V4SFmode || mode == V8HFmode))
+      return true;
+
   return false;
 }
 
@@ -28387,6 +28487,10 @@  arm_array_mode_supported_p (machine_mode mode,
       && (nelems >= 2 && nelems <= 4))
     return true;
 
+  if (TARGET_HAVE_MVE && !BYTES_BIG_ENDIAN
+      && VALID_MVE_MODE (mode) && (nelems == 2 || nelems == 4))
+    return true;
+
   return false;
 }
 
@@ -29435,7 +29539,7 @@  arm_conditional_register_usage (void)
   if (TARGET_THUMB1)
     fixed_regs[LR_REGNUM] = call_used_regs[LR_REGNUM] = 1;
 
-  if (TARGET_32BIT && TARGET_HARD_FLOAT)
+  if (TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE))
     {
       /* VFPv3 registers are disabled when earlier VFP
 	 versions are selected due to the definition of
@@ -29447,6 +29551,8 @@  arm_conditional_register_usage (void)
 	  call_used_regs[regno] = regno < FIRST_VFP_REGNUM + 16
 	    || regno >= FIRST_VFP_REGNUM + 32;
 	}
+      if (TARGET_HAVE_MVE)
+	fixed_regs[VPR_REGNUM] = 0;
     }
 
   if (TARGET_REALLY_IWMMXT && !TARGET_GENERAL_REGS_ONLY)
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index c62ad1b360ebecd5368e90ea5634488eef22f2fc..689baa0b0ff63ef90f47d2fd844cb98c9a1457a0 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -41,6 +41,7 @@ 
    (VFPCC_REGNUM    101)	; VFP Condition code pseudo register
    (APSRQ_REGNUM    104)	; Q bit pseudo register
    (APSRGE_REGNUM   105)	; GE bits pseudo register
+   (VPR_REGNUM      106)	; Vector Predication Register - MVE register.
   ]
 )
 ;; 3rd operand to select_dominance_cc_mode
@@ -7293,7 +7294,7 @@ 
   [(set (match_operand:SF 0 "nonimmediate_operand" "=r,r,m")
 	(match_operand:SF 1 "general_operand"  "r,mE,r"))]
   "TARGET_32BIT
-   && TARGET_SOFT_FLOAT
+   && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE
    && (!MEM_P (operands[0])
        || register_operand (operands[1], SFmode))"
 {
@@ -7416,8 +7417,8 @@ 
 
 (define_insn "*movdf_soft_insn"
   [(set (match_operand:DF 0 "nonimmediate_soft_df_operand" "=r,r,r,r,m")
-	(match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]
-  "TARGET_32BIT && TARGET_SOFT_FLOAT
+       (match_operand:DF 1 "soft_df_operand" "rDa,Db,Dc,mF,r"))]
+  "TARGET_32BIT && TARGET_SOFT_FLOAT && !TARGET_HAVE_MVE
    && (   register_operand (operands[0], DFmode)
        || register_operand (operands[1], DFmode))"
   "*
@@ -11681,7 +11682,7 @@ 
                    (match_operand:SI 2 "const_int_I_operand" "I")))
      (set (match_operand:DF 3 "vfp_hard_register_operand" "")
           (mem:DF (match_dup 1)))])]
-  "TARGET_32BIT && TARGET_HARD_FLOAT"
+  "TARGET_32BIT && (TARGET_HARD_FLOAT || TARGET_HAVE_MVE)"
   "*
   {
     int num_regs = XVECLEN (operands[0], 0);
@@ -12624,7 +12625,7 @@ 
    (set_attr "length" "8")]
 )
 
-;; Vector bits common to IWMMXT and Neon
+;; Vector bits common to IWMMXT, Neon and MVE
 (include "vec-common.md")
 ;; Load the Intel Wireless Multimedia Extension patterns
 (include "iwmmxt.md")
@@ -12642,3 +12643,5 @@ 
 (include "sync.md")
 ;; Fixed-point patterns
 (include "arm-fixed.md")
+;; M-profile Vector Extensions
+(include "mve.md")
diff --git a/gcc/config/arm/arm_mve.h b/gcc/config/arm/arm_mve.h
new file mode 100644
index 0000000000000000000000000000000000000000..5ffb466596b5d8fc330616a6fcc7ee37d3e28def
--- /dev/null
+++ b/gcc/config/arm/arm_mve.h
@@ -0,0 +1,59 @@ 
+/* Arm MVE intrinsics include file.
+
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   Contributed by Arm.
+
+   This file is part of GCC.
+
+   GCC is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published
+   by the Free Software Foundation; either version 3, or (at your
+   option) any later version.
+
+   GCC is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+   or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public
+   License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _GCC_ARM_MVE_H
+#define _GCC_ARM_MVE_H
+
+#if !__ARM_FEATURE_MVE
+#error "MVE feature not supported"
+#endif
+
+#include <stdint.h>
+#ifndef  __cplusplus
+#include <stdbool.h>
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#if (__ARM_FEATURE_MVE & 2) /* MVE Floating point.  */
+typedef __fp16 float16_t;
+typedef float float32_t;
+typedef __simd128_float16_t float16x8_t;
+typedef __simd128_float32_t float32x4_t;
+#endif
+
+typedef uint16_t mve_pred16_t;
+typedef __simd128_uint8_t uint8x16_t;
+typedef __simd128_uint16_t uint16x8_t;
+typedef __simd128_uint32_t uint32x4_t;
+typedef __simd128_uint64_t uint64x2_t;
+typedef __simd128_int8_t int8x16_t;
+typedef __simd128_int16_t int16x8_t;
+typedef __simd128_int32_t int32x4_t;
+typedef __simd128_int64_t int64x2_t;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif /* _GCC_ARM_MVE_H.  */
diff --git a/gcc/config/arm/constraints.md b/gcc/config/arm/constraints.md
index 6f309b95cc1874ac7bc69e435781070e0c9cb70a..f77084a0efd489491372bb1dafbc0cd585f0f518 100644
--- a/gcc/config/arm/constraints.md
+++ b/gcc/config/arm/constraints.md
@@ -44,6 +44,8 @@ 
 ;; in Thumb state: Uu, Uw
 ;; in all states: Q
 
+(define_register_constraint "Up" "TARGET_HAVE_MVE ? VPR_REG : NO_REGS"
+  "MVE VPR register")
 
 (define_register_constraint "t" "TARGET_32BIT ? VFP_LO_REGS : NO_REGS"
  "The VFP registers @code{s0}-@code{s31}.")
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index c412851843f4468c2c18bce264288705e076ac50..e30325bc1652d378be2544fa32269c5c4294d7e9 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -62,6 +62,12 @@ 
 ;; Integer and float modes supported by Neon and IWMMXT.
 (define_mode_iterator VALL [V2DI V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
+;; Integer and float modes supported by Neon, IWMMXT and MVE.
+(define_mode_iterator VNIM1 [V16QI V8HI V4SI V4SF V2DI])
+
+;; Integer and float modes supported by Neon and IWMMXT but not MVE.
+(define_mode_iterator VNINOTM1 [V2SI V4HI V8QI V2SF])
+
 ;; Integer and float modes supported by Neon and IWMMXT, except V2DI.
 (define_mode_iterator VALLW [V2SI V4HI V8QI V2SF V4SI V8HI V16QI V4SF])
 
@@ -105,7 +111,8 @@ 
 (define_mode_iterator VQXMOV [V16QI V8HI V8HF V4SI V4SF V2DI TI])
 
 ;; Opaque structure types wider than TImode.
-(define_mode_iterator VSTRUCT [EI OI CI XI])
+(define_mode_iterator VSTRUCT [(EI "!TARGET_HAVE_MVE") OI
+			       (CI "!TARGET_HAVE_MVE") XI])
 
 ;; Opaque structure types used in table lookups (except vtbl1/vtbx1).
 (define_mode_iterator VTAB [TI EI OI])
diff --git a/gcc/config/arm/mve.md b/gcc/config/arm/mve.md
new file mode 100644
index 0000000000000000000000000000000000000000..53334c6d329dedd482615b996232e85ded7a34f8
--- /dev/null
+++ b/gcc/config/arm/mve.md
@@ -0,0 +1,78 @@ 
+;; Arm M-profile Vector Extension Machine Description
+;; Copyright (C) 2019 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; <http://www.gnu.org/licenses/>.
+
+(define_mode_iterator MVE_types [V16QI V8HI V4SI V2DI TI V8HF V4SF V2DF])
+(define_mode_attr V_sz_elem2 [(V16QI "s8") (V8HI "u16") (V4SI "u32")
+			      (V2DI "u64")])
+
+(define_insn "*mve_mov<mode>"
+  [(set (match_operand:MVE_types 0 "s_register_operand" "=w,w,r,w,w,r,w")
+	(match_operand:MVE_types 1 "general_operand" "w,r,w,Dn,Usi,r,Dm"))]
+  "TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT"
+{
+  if (which_alternative == 3 || which_alternative == 6)
+    {
+      int width, is_valid;
+      static char templ[40];
+
+      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
+	&operands[1], &width);
+
+      gcc_assert (is_valid != 0);
+
+      if (width == 0)
+	return "vmov.f32\t%q0, %1  @ <mode>";
+      else
+	sprintf (templ, "vmov.i%d\t%%q0, %%x1  @ <mode>", width);
+      return templ;
+    }
+  switch (which_alternative)
+    {
+    case 0:
+      return "vmov\t%q0, %q1";
+    case 1:
+      return "vmov\t%e0, %Q1, %R1  @ <mode>\;vmov\t%f0, %J1, %K1";
+    case 2:
+      return "vmov\t%Q0, %R0, %e1  @ <mode>\;vmov\t%J0, %K0, %f1";
+    case 4:
+      if ((TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))
+	  || (MEM_P (operands[1])
+	      && GET_CODE (XEXP (operands[1], 0)) == LABEL_REF))
+	return output_move_neon (operands);
+      else
+	return "vldrb.<V_sz_elem2> %q0, %E1";
+    case 5:
+      return output_move_neon (operands);
+    case 6:
+    default:
+      gcc_unreachable ();
+      return "";
+    }
+}
+  [(set_attr "type" "mve_move,mve_move,mve_move,mve_move,mve_load,mve_move,mve_move")
+   (set_attr "length" "4,8,8,4,8,8,4")
+   (set_attr "thumb2_pool_range" "*,*,*,*,1018,*,*")
+   (set_attr "neg_pool_range" "*,*,*,*,996,*,*")])
+
+(define_insn "*mve_vstr<mode>"
+  [(set (match_operand:MVE_types 0 "memory_operand" "=Us")
+	(match_operand:MVE_types 1 "s_register_operand" "w"))]
+  "TARGET_HAVE_MVE"
+  "vstrb.<V_sz_elem> %q1, %E0"
+  [(set_attr "type" "mve_store")])
diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
index 6a0ee28efc9aa9f1fba7b5ae031564f40aa095fe..c23783e0ed914ec21a92828388ada58ada3c6132 100644
--- a/gcc/config/arm/neon.md
+++ b/gcc/config/arm/neon.md
@@ -35,9 +35,9 @@ 
 
 (define_insn "*neon_mov<mode>"
   [(set (match_operand:VDX 0 "nonimmediate_operand"
-	  "=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
+	"=w,Un,w, w, w,  ?r,?w,?r, ?Us,*r")
 	(match_operand:VDX 1 "general_operand"
-	  " w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
+	" w,w, Dm,Dn,Uni, w, r, Usi,r,*r"))]
   "TARGET_NEON
    && (register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
@@ -47,7 +47,7 @@ 
       int width, is_valid;
       static char templ[40];
 
-      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
+      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
         &operands[1], &width);
 
       gcc_assert (is_valid != 0);
@@ -94,7 +94,7 @@ 
       int width, is_valid;
       static char templ[40];
 
-      is_valid = neon_immediate_valid_for_move (operands[1], <MODE>mode,
+      is_valid = simd_immediate_valid_for_move (operands[1], <MODE>mode,
         &operands[1], &width);
 
       gcc_assert (is_valid != 0);
@@ -147,9 +147,9 @@ 
 })
 
 (define_expand "mov<mode>"
-  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand")
-	(match_operand:VSTRUCT 1 "general_operand"))]
-  "TARGET_NEON"
+  [(set (match_operand:VSTRUCT 0 "nonimmediate_operand" "")
+	(match_operand:VSTRUCT 1 "general_operand" ""))]
+  "TARGET_NEON || TARGET_HAVE_MVE"
 {
   gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
   gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
@@ -160,24 +160,28 @@ 
     }
 })
 
-(define_expand "mov<mode>"
-  [(set (match_operand:VH 0 "s_register_operand")
-	(match_operand:VH 1 "s_register_operand"))]
+;; The pattern mov<mode> where mode is v4hf and v8hf is split into
+;; movv4hf and movv8hf.  The pattern movv8hf is common for MVE and
+;; NEON, so it is moved into vec-common.md file.
+(define_expand "movv4hf"
+  [(set (match_operand:V4HF 0 "s_register_operand")
+	(match_operand:V4HF 1 "s_register_operand"))]
   "TARGET_NEON"
 {
-  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
-  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[0], E_V4HFmode));
+  gcc_checking_assert (aligned_operand (operands[1], E_V4HFmode));
   if (can_create_pseudo_p ())
     {
       if (!REG_P (operands[0]))
-	operands[1] = force_reg (<MODE>mode, operands[1]);
+	operands[1] = force_reg (E_V4HFmode, operands[1]);
     }
 })
 
+
 (define_insn "*neon_mov<mode>"
   [(set (match_operand:VSTRUCT 0 "nonimmediate_operand"	"=w,Ut,w")
 	(match_operand:VSTRUCT 1 "general_operand"	" w,w, Ut"))]
-  "TARGET_NEON
+  "(TARGET_NEON || TARGET_HAVE_MVE)
    && (register_operand (operands[0], <MODE>mode)
        || register_operand (operands[1], <MODE>mode))"
 {
@@ -213,7 +217,7 @@ 
 (define_split
   [(set (match_operand:OI 0 "s_register_operand" "")
 	(match_operand:OI 1 "s_register_operand" ""))]
-  "TARGET_NEON && reload_completed"
+  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2) (match_dup 3))]
 {
@@ -254,7 +258,7 @@ 
 (define_split
   [(set (match_operand:XI 0 "s_register_operand" "")
 	(match_operand:XI 1 "s_register_operand" ""))]
-  "TARGET_NEON && reload_completed"
+  "(TARGET_NEON || TARGET_HAVE_MVE) && reload_completed"
   [(set (match_dup 0) (match_dup 1))
    (set (match_dup 2) (match_dup 3))
    (set (match_dup 4) (match_dup 5))
@@ -489,7 +493,7 @@ 
 (define_expand "vec_init<mode><V_elem_l>"
   [(match_operand:VDQ 0 "s_register_operand")
    (match_operand 1 "" "")]
-  "TARGET_NEON"
+  "TARGET_NEON || TARGET_HAVE_MVE"
 {
   neon_expand_vector_init (operands[0], operands[1]);
   DONE;
diff --git a/gcc/config/arm/predicates.md b/gcc/config/arm/predicates.md
index 2f0f532edf40d475e4199aa41bd7803fac8d6143..9d74165fe065b03c77918fe9e4611967799535f1 100644
--- a/gcc/config/arm/predicates.md
+++ b/gcc/config/arm/predicates.md
@@ -48,6 +48,16 @@ 
   return guard_addr_operand (XEXP (op, 0), mode);
 })
 
+(define_predicate "vpr_register_operand"
+  (match_code "reg,subreg")
+{
+  if (GET_CODE (op) == SUBREG)
+    op = SUBREG_REG (op);
+  return REG_P (op)
+	  && (REGNO (op) >= FIRST_PSEUDO_REGISTER
+	      || IS_VPR_REGNUM (REGNO (op)));
+})
+
 (define_predicate "imm_for_neon_inv_logic_operand"
   (match_code "const_vector")
 {
@@ -706,7 +716,7 @@ 
 (define_predicate "imm_for_neon_mov_operand"
   (match_code "const_vector,const_int")
 {
-  return neon_immediate_valid_for_move (op, mode, NULL, NULL);
+  return simd_immediate_valid_for_move (op, mode, NULL, NULL);
 })
 
 (define_predicate "imm_for_neon_lshift_operand"
diff --git a/gcc/config/arm/t-arm b/gcc/config/arm/t-arm
index af60c8fc285bb536afeb9ec5c21771a4379755fc..fda5e84355b56a20eb9a22919ab1c786120cc8f1 100644
--- a/gcc/config/arm/t-arm
+++ b/gcc/config/arm/t-arm
@@ -55,6 +55,7 @@  MD_INCLUDES=	$(srcdir)/config/arm/arm1020e.md \
 		$(srcdir)/config/arm/ldmstm.md \
 		$(srcdir)/config/arm/ldrdstrd.md \
 		$(srcdir)/config/arm/marvell-f-iwmmxt.md \
+		$(srcdir)/config/arm/mve.md \
 		$(srcdir)/config/arm/neon.md \
 		$(srcdir)/config/arm/predicates.md \
 		$(srcdir)/config/arm/sync.md \
diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md
index 60faad6597935607ed3c5593f941a04bbc924252..c99b846ab387bac633be8b1631f0e40b3c827850 100644
--- a/gcc/config/arm/types.md
+++ b/gcc/config/arm/types.md
@@ -550,6 +550,11 @@ 
 ; The classification below is for TME instructions
 ;
 ; tme
+; The classification below is for M-profile Vector Extension instructions
+;
+; mve_move
+; mve_store
+; mve_load
 
 (define_attr "type"
  "adc_imm,\
@@ -1096,7 +1101,11 @@ 
   crypto_sm3,\
   crypto_sm4,\
   coproc,\
-  tme"
+  tme,\
+\
+  mve_move,\
+  mve_store,\
+  mve_load"
    (const_string "untyped"))
 
 ; Is this an (integer side) multiply with a 32-bit (or smaller) result?
diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-common.md
index 33ff5627284d7cc898074b562179938982ecc420..5f5c113cf95afafbb733e1bfd2a7c7b8a55651a2 100644
--- a/gcc/config/arm/vec-common.md
+++ b/gcc/config/arm/vec-common.md
@@ -21,8 +21,31 @@ 
 ;; Vector Moves
 
 (define_expand "mov<mode>"
-  [(set (match_operand:VALL 0 "nonimmediate_operand")
-	(match_operand:VALL 1 "general_operand"))]
+  [(set (match_operand:VNIM1 0 "nonimmediate_operand")
+	(match_operand:VNIM1 1 "general_operand"))]
+  "TARGET_NEON
+   || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE && VALID_MVE_SI_MODE (<MODE>mode))
+   || (TARGET_HAVE_MVE_FLOAT && VALID_MVE_SF_MODE (<MODE>mode))"
+   {
+  gcc_checking_assert (aligned_operand (operands[0], <MODE>mode));
+  gcc_checking_assert (aligned_operand (operands[1], <MODE>mode));
+  if (can_create_pseudo_p ())
+    {
+      if (!REG_P (operands[0]))
+	operands[1] = force_reg (<MODE>mode, operands[1]);
+      else if ((TARGET_NEON || TARGET_HAVE_MVE || TARGET_HAVE_MVE_FLOAT)
+	       && (CONSTANT_P (operands[1])))
+	{
+	  operands[1] = neon_make_constant (operands[1]);
+	  gcc_assert (operands[1] != NULL_RTX);
+	}
+    }
+})
+
+(define_expand "mov<mode>"
+  [(set (match_operand:VNINOTM1 0 "nonimmediate_operand")
+	(match_operand:VNINOTM1 1 "general_operand"))]
   "TARGET_NEON
    || (TARGET_REALLY_IWMMXT && VALID_IWMMXT_REG_MODE (<MODE>mode))"
 {
@@ -40,6 +63,20 @@ 
     }
 })
 
+(define_expand "movv8hf"
+  [(set (match_operand:V8HF 0 "s_register_operand")
+       (match_operand:V8HF 1 "s_register_operand"))]
+   "TARGET_NEON || TARGET_HAVE_MVE_FLOAT"
+{
+  gcc_checking_assert (aligned_operand (operands[0], E_V8HFmode));
+  gcc_checking_assert (aligned_operand (operands[1], E_V8HFmode));
+   if (can_create_pseudo_p ())
+     {
+       if (!REG_P (operands[0]))
+	 operands[1] = force_reg (E_V8HFmode, operands[1]);
+     }
+})
+
 ;; Vector arithmetic. Expanders are blank, then unnamed insns implement
 ;; patterns separately for IWMMXT and Neon.
 
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index 573db164f01b4ac9ee4a9ee7414872fb93c9e2ca..6349c0570540ec25a599166f5d427fcbdbf2af68 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -311,7 +311,7 @@ 
    && (   register_operand (operands[0], DImode)
        || register_operand (operands[1], DImode))
    && !(TARGET_NEON && CONST_INT_P (operands[1])
-        && neon_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
+	&& simd_immediate_valid_for_move (operands[1], DImode, NULL, NULL))"
   "*
   switch (which_alternative)
     {
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c
new file mode 100644
index 0000000000000000000000000000000000000000..c3f81546c9f14f2491c6fb134170f17bcba16069
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float.c
@@ -0,0 +1,27 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo32 (float32x4_t value)
+{
+  float32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
+
+float16x8_t
+foo16 (float16x8_t value)
+{
+  float16x8_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c
new file mode 100644
index 0000000000000000000000000000000000000000..ebee0d2f1ad03b66d044d93bf901e0ce78eccba9
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float1.c
@@ -0,0 +1,31 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t value;
+
+float32x4_t
+foo32 ()
+{
+  float32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
+
+float16x8_t value1;
+
+float16x8_t
+foo16 ()
+{
+  float16x8_t b = value1;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldmia.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9b9c84d66ef8fd585a42be1ac7585d8bc6c529bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_float2.c
@@ -0,0 +1,27 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve.fp -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+float32x4_t
+foo32 ()
+{
+  float32x4_t b = {10.0, 12.0, 14.0, 16.0};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32*" }  } */
+
+float16x8_t
+foo16 ()
+{
+  float16x8_t b = {32.01};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c
new file mode 100644
index 0000000000000000000000000000000000000000..6b54c3c61f32cf8e0af30272df63f261def0b8c5
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int.c
@@ -0,0 +1,49 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo8 (int8x16_t value)
+{
+  int8x16_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.s8*" }  } */
+
+int16x8_t
+foo16 (int16x8_t value)
+{
+  int16x8_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+int32x4_t
+foo32 (int32x4_t value)
+{
+  int32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32*" }  } */
+
+int64x2_t
+foo64 (int64x2_t value)
+{
+  int64x2_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c
new file mode 100644
index 0000000000000000000000000000000000000000..748ddecbd4011bb24058c27cd6a09d66f71ce581
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int1.c
@@ -0,0 +1,54 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t value1;
+int16x8_t value2;
+int32x4_t value3;
+int64x2_t value4;
+
+int8x16_t
+foo8 ()
+{
+  int8x16_t b = value1;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u8*" }  } */
+
+int16x8_t
+foo16 ()
+{
+  int16x8_t b = value2;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+int32x4_t
+foo32 ()
+{
+  int32x4_t b = value3;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32" }  } */
+
+int64x2_t
+foo64 ()
+{
+  int64x2_t b = value4;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c
new file mode 100644
index 0000000000000000000000000000000000000000..376ec9ee7fc04ddde98719d2605319a378f9a6bb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_int2.c
@@ -0,0 +1,49 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+int8x16_t
+foo8 ()
+{
+  int8x16_t b = {1, 2, 3, 4};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+int16x8_t
+foo16 (int16x8_t value)
+{
+  int16x8_t b = {1, 2, 3};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+int32x4_t
+foo32 (int32x4_t value)
+{
+  int32x4_t b = {1, 2};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+int64x2_t
+foo64 (int64x2_t value)
+{
+  int64x2_t b = {1};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c
new file mode 100644
index 0000000000000000000000000000000000000000..f001d14f9ca4c851ed4b3371ae9599d23d2b62ce
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint.c
@@ -0,0 +1,49 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo8 (uint8x16_t value)
+{
+  uint8x16_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.s8*" }  } */
+
+uint16x8_t
+foo16 (uint16x8_t value)
+{
+  uint16x8_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+uint32x4_t
+foo32 (uint32x4_t value)
+{
+  uint32x4_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32*" }  } */
+
+uint64x2_t
+foo64 (uint64x2_t value)
+{
+  uint64x2_t b = value;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c
new file mode 100644
index 0000000000000000000000000000000000000000..56d40668d63ba0b24c08944981c415054494c37d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint1.c
@@ -0,0 +1,54 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t value1;
+uint16x8_t value2;
+uint32x4_t value3;
+uint64x2_t value4;
+
+uint8x16_t
+foo8 ()
+{
+  uint8x16_t b = value1;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.s8*" }  } */
+
+uint16x8_t
+foo16 ()
+{
+  uint16x8_t b = value2;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u16*" }  } */
+
+uint32x4_t
+foo32 ()
+{
+  uint32x4_t b = value3;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u32*" }  } */
+
+uint64x2_t
+foo64 ()
+{
+  uint64x2_t b = value4;
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrb.u64*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c
new file mode 100644
index 0000000000000000000000000000000000000000..9ff9b67993ac83cf398880cb65510604a37de6a4
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/mve_vector_uint2.c
@@ -0,0 +1,49 @@ 
+/* { dg-do compile  } */
+/* { dg-additional-options "-march=armv8.1-m.main+mve -mfloat-abi=hard"  }  */
+/* { dg-skip-if "Skip if not auto" {*-*-*} {"-mfpu=*"} {"-mcpu=auto"} } */
+
+#include "arm_mve.h"
+
+uint8x16_t
+foo8 (uint8x16_t value)
+{
+  uint8x16_t b = {1, 2, 3, 4};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+uint16x8_t
+foo16 (uint16x8_t value)
+{
+  uint16x8_t b = {1, 2, 3};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+uint32x4_t
+foo32 (uint32x4_t value)
+{
+  uint32x4_t b = {1, 2};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
+
+uint64x2_t
+foo64 (uint64x2_t value)
+{
+  uint64x2_t b = {1};
+  return b;
+}
+
+/* { dg-final { scan-assembler "vmov\\tq\[0-7\], q\[0-7\]"  }  } */
+/* { dg-final { scan-assembler "vstrb.*" }  } */
+/* { dg-final { scan-assembler "vldrw.32.*" }  } */
diff --git a/gcc/testsuite/gcc.target/arm/mve/mve.exp b/gcc/testsuite/gcc.target/arm/mve/mve.exp
new file mode 100644
index 0000000000000000000000000000000000000000..77ae3fa292b2892fb22c2f89223ca19dc16ccc99
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/mve/mve.exp
@@ -0,0 +1,47 @@ 
+# Copyright (C) 2019 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# <http://www.gnu.org/licenses/>.
+
+# GCC testsuite that uses the `dg.exp' driver.
+
+# Exit immediately if this isn't an ARM target.
+if ![istarget arm*-*-*] then {
+  return
+}
+
+# Load support procs.
+load_lib gcc-dg.exp
+
+# If a testcase doesn't have special options, use these.
+global DEFAULT_CFLAGS
+if ![info exists DEFAULT_CFLAGS] then {
+    set DEFAULT_CFLAGS " -ansi -pedantic-errors"
+}
+
+# This variable should only apply to tests called in this exp file.
+global dg_runtest_extra_prunes
+set dg_runtest_extra_prunes ""
+lappend dg_runtest_extra_prunes "warning: switch -m(cpu|arch)=.* conflicts with -m(cpu|arch)=.* switch"
+
+# Initialize `dg'.
+dg-init
+
+# Main loop.
+dg-runtest [lsort [glob -nocomplain $srcdir/$subdir/intrinsics/*.\[cCS\]]] \
+	"" $DEFAULT_CFLAGS
+
+# All done.
+set dg_runtest_extra_prunes ""
+dg-finish