x86: Add support for Intel AMX instructions

Message ID BYAPR11MB303116333F7B693EFE390B8B9E910@BYAPR11MB3031.namprd11.prod.outlook.com
State New
Headers show
Series
  • x86: Add support for Intel AMX instructions
Related show

Commit Message

Alan Modra via Binutils June 28, 2020, 7:43 a.m.
Hi all,
 
This patch is about to enable binutils support for AMX which would be in GLC.
INTEL ADVANCED MATRIX EXTENSIONS (AMX).
AMX is a new programming paradigm, it has a set of 2-dimensional registers
(TILES) representing sub-arrays from a larger 2-dimensional memory image and
operate on TILES, more details please refer to
 https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

Make check-gas is ok.

x86: Add support for Intel AMX instructions

gas/

	* doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.
	* config/tc-i386.c (i386_error): Add invalid_sib_address.
	(cpu_arch): Add .amx-int8, amx-bf16 and .amx-tile.
	(cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.
	(type_names): Add rTMM.
	(check_VecOperands): Disallow RegIP for non-vector SIB.
	(check_reverse): Handle invalid_sib_address.
	(build_modrm_byte): Handle VEXOP3 and non-vector SIB.
	* testsuite/gas/i386/x86-64-amx-intel.d: New.
	* testsuite/gas/i386/x86-64-amx-sibmem-inval.l: New.
	* testsuite/gas/i386/x86-64-amx-sibmem-inval.s: New.
	* testsuite/gas/i386/x86-64-amx.d: New.
	* testsuite/gas/i386/x86-64-amx.s: New.
	* testsuite/gas/i386/i386.exp: Run above new tests.

opcodes/

	* i386-dis.c (EV): New for generic memory operand.
	(XMT): New.
	(EXtmm): Likewise.
	(Vextmm): Likewise.
	(tmm_mode): Likewise.
	(void_mode): Likewise.
	(REG_VEX_W_0_0F3849_P_0_M_3): Likewise.
	(MOD_VEX_W_0_0F3849_P_0): Likewise.
	(MOD_VEX_W_0_0F3849_P_2): Likewise.
	(MOD_VEX_W_0_0F3849_P_3): Likewise.
	(MOD_VEX_W_0_0F384B_P_1): Likewise.
	(MOD_VEX_W_0_0F384B_P_2): Likewise.
	(MOD_VEX_W_0_0F384B_P_3): Likewise.
	(MOD_VEX_W_0_0F385C_P_1): Likewise.
	(MOD_VEX_W_0_0F385E_P_0): Likewise.
	(MOD_VEX_W_0_0F385E_P_1): Likewise.
	(MOD_VEX_W_0_0F385E_P_2): Likewise.
	(MOD_VEX_W_0_0F385E_P_3): Likewise.
	(RM_VEX_W_0_0F3849_P_0_M_3_R_0): Likewise.
	(PREFIX_VEX_0F3849): Likewise.
	(PREFIX_VEX_0F384B): Likewise.
	(PREFIX_VEX_0F385C): Likewise.
	(PREFIX_VEX_0F385E): Likewise.
	(X86_64_0F01_REG_3): Likewise.
	(X86_64_VEX_W_0_0F3849_P_0_M_0): Likewise.
	(X86_64_0F3849_MOD_3_REG_0_RM_0): Likewise.
	(X86_64_VEX_W_0_0F3849_P_2_M_0): Likewise.
	(X86_64_VEX_W_0_0F3849_P_3_M_0): Likewise.
	(X86_64_MOD_VEX_W_0_0F384B_P_1): Likewise.
	(X86_64_MOD_VEX_W_0_0F384B_P_2): Likewise.
	(X86_64_MOD_VEX_W_0_0F384B_P_3): Likewise.
	(X86_64_MOD_VEX_W_0_0F385C_P_1): Likewise.
	(X86_64_MOD_VEX_W_0_0F385E_P_0): Likewise.
	(X86_64_MOD_VEX_W_0_0F385E_P_1): Likewise.
	(X86_64_MOD_VEX_W_0_0F385E_P_2): Likewise.
	(X86_64_MOD_VEX_W_0_0F385E_P_3): Likewise.
	(VEX_W_0F3849_P_0): Likewise.
	(VEX_W_0F3849_P_2): Likewise.
	(VEX_W_0F3849_P_3): Likewise.
	(VEX_W_0F384B_P_1): Likewise.
	(VEX_W_0F384B_P_2): Likewise.
	(VEX_W_0F384B_P_3): Likewise.
	(VEX_W_0F385C_P_1): Likewise.
	(VEX_W_0F385E_P_0): Likewise.
	(VEX_W_0F385E_P_1): Likewise.
	(VEX_W_0F385E_P_2): Likewise.
	(VEX_W_0F385E_P_3): Likewise.
	(names_tmm): Likewise.
	(att_names_tmm): Likewise.
	(intel_operand_size): Handle void_mode.
	(OP_XMM): Handle tmm_mode.
	(OP_EX): Likewise.
	(OP_VEX): Likewise.
	* i386-gen.c (cpu_flag_init): Add entries for
	CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(operand_type_shorthands): Add RegTMM.
	(operand_type_init): Likewise.
	(operand_types): Add Tmmword.
	(cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	* i386-opc.h (CpuAMX_INT8): New.
	(CpuAMX_BF16): Likewise.
	(CpuAMX_TILE): Likewise.
	(VEXOP3): Likewise.
	(SIBMEM): Likewise.
	(Tmmword): Likewise.
	(i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.
	(i386_opcode_modifier): Extend width of fields vexvvvv and sib.
	(i386_operand_type): Add tmmword.
	* i386-opc.tbl: Add AMX instructions.
	* i386-reg.tbl: Add AMX registers.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                          |  90 +++++-
 gas/doc/c-i386.texi                           |   7 +
 gas/testsuite/gas/i386/i386.exp               |   3 +
 gas/testsuite/gas/i386/x86-64-amx-intel.d     |  69 +++++
 .../gas/i386/x86-64-amx-sibmem-inval.l        |   7 +
 .../gas/i386/x86-64-amx-sibmem-inval.s        |  12 +
 gas/testsuite/gas/i386/x86-64-amx.d           |  69 +++++
 gas/testsuite/gas/i386/x86-64-amx.s           |  61 ++++
 opcodes/i386-dis.c                            | 290 +++++++++++++++++-
 opcodes/i386-gen.c                            |  18 ++
 opcodes/i386-opc.h                            |  20 +-
 opcodes/i386-opc.tbl                          |  28 ++
 opcodes/i386-reg.tbl                          |   9 +
 13 files changed, 667 insertions(+), 16 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.s

-- 
2.17.1

Comments

Alan Modra via Binutils June 29, 2020, 3:11 a.m. | #1
On Sun, Jun 28, 2020 at 12:44 AM Cui, Lili <lili.cui@intel.com> wrote:
>

> Hi all,

>

> This patch is about to enable binutils support for AMX which would be in GLC.

> INTEL ADVANCED MATRIX EXTENSIONS (AMX).

> AMX is a new programming paradigm, it has a set of 2-dimensional registers

> (TILES) representing sub-arrays from a larger 2-dimensional memory image and

> operate on TILES, more details please refer to

>  https://software.intel.com/content/dam/develop/public/us/en/documents/architecture-instruction-set-extensions-programming-reference.pdf

>

> Make check-gas is ok.

>

> x86: Add support for Intel AMX instructions

>

> gas/

>

>         * doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.

>         * config/tc-i386.c (i386_error): Add invalid_sib_address.

>         (cpu_arch): Add .amx-int8, amx-bf16 and .amx-tile.

>         (cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.

>         (type_names): Add rTMM.

>         (check_VecOperands): Disallow RegIP for non-vector SIB.

>         (check_reverse): Handle invalid_sib_address.

>         (build_modrm_byte): Handle VEXOP3 and non-vector SIB.

>         * testsuite/gas/i386/x86-64-amx-intel.d: New.

>         * testsuite/gas/i386/x86-64-amx-sibmem-inval.l: New.

>         * testsuite/gas/i386/x86-64-amx-sibmem-inval.s: New.

>         * testsuite/gas/i386/x86-64-amx.d: New.

>         * testsuite/gas/i386/x86-64-amx.s: New.

>         * testsuite/gas/i386/i386.exp: Run above new tests.

>

> opcodes/

>

>         * i386-dis.c (EV): New for generic memory operand.

>         (XMT): New.

>         (EXtmm): Likewise.

>         (Vextmm): Likewise.

>         (tmm_mode): Likewise.

>         (void_mode): Likewise.

>         (REG_VEX_W_0_0F3849_P_0_M_3): Likewise.

>         (MOD_VEX_W_0_0F3849_P_0): Likewise.

>         (MOD_VEX_W_0_0F3849_P_2): Likewise.

>         (MOD_VEX_W_0_0F3849_P_3): Likewise.

>         (MOD_VEX_W_0_0F384B_P_1): Likewise.

>         (MOD_VEX_W_0_0F384B_P_2): Likewise.

>         (MOD_VEX_W_0_0F384B_P_3): Likewise.

>         (MOD_VEX_W_0_0F385C_P_1): Likewise.

>         (MOD_VEX_W_0_0F385E_P_0): Likewise.

>         (MOD_VEX_W_0_0F385E_P_1): Likewise.

>         (MOD_VEX_W_0_0F385E_P_2): Likewise.

>         (MOD_VEX_W_0_0F385E_P_3): Likewise.

>         (RM_VEX_W_0_0F3849_P_0_M_3_R_0): Likewise.

>         (PREFIX_VEX_0F3849): Likewise.

>         (PREFIX_VEX_0F384B): Likewise.

>         (PREFIX_VEX_0F385C): Likewise.

>         (PREFIX_VEX_0F385E): Likewise.

>         (X86_64_0F01_REG_3): Likewise.

>         (X86_64_VEX_W_0_0F3849_P_0_M_0): Likewise.

>         (X86_64_0F3849_MOD_3_REG_0_RM_0): Likewise.

>         (X86_64_VEX_W_0_0F3849_P_2_M_0): Likewise.

>         (X86_64_VEX_W_0_0F3849_P_3_M_0): Likewise.

>         (X86_64_MOD_VEX_W_0_0F384B_P_1): Likewise.

>         (X86_64_MOD_VEX_W_0_0F384B_P_2): Likewise.

>         (X86_64_MOD_VEX_W_0_0F384B_P_3): Likewise.

>         (X86_64_MOD_VEX_W_0_0F385C_P_1): Likewise.

>         (X86_64_MOD_VEX_W_0_0F385E_P_0): Likewise.

>         (X86_64_MOD_VEX_W_0_0F385E_P_1): Likewise.

>         (X86_64_MOD_VEX_W_0_0F385E_P_2): Likewise.

>         (X86_64_MOD_VEX_W_0_0F385E_P_3): Likewise.

>         (VEX_W_0F3849_P_0): Likewise.

>         (VEX_W_0F3849_P_2): Likewise.

>         (VEX_W_0F3849_P_3): Likewise.

>         (VEX_W_0F384B_P_1): Likewise.

>         (VEX_W_0F384B_P_2): Likewise.

>         (VEX_W_0F384B_P_3): Likewise.

>         (VEX_W_0F385C_P_1): Likewise.

>         (VEX_W_0F385E_P_0): Likewise.

>         (VEX_W_0F385E_P_1): Likewise.

>         (VEX_W_0F385E_P_2): Likewise.

>         (VEX_W_0F385E_P_3): Likewise.

>         (names_tmm): Likewise.

>         (att_names_tmm): Likewise.

>         (intel_operand_size): Handle void_mode.

>         (OP_XMM): Handle tmm_mode.

>         (OP_EX): Likewise.

>         (OP_VEX): Likewise.

>         * i386-gen.c (cpu_flag_init): Add entries for

>         CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.

>         (operand_type_shorthands): Add RegTMM.

>         (operand_type_init): Likewise.

>         (operand_types): Add Tmmword.

>         (cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.

>         (cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.

>         * i386-opc.h (CpuAMX_INT8): New.

>         (CpuAMX_BF16): Likewise.

>         (CpuAMX_TILE): Likewise.

>         (VEXOP3): Likewise.

>         (SIBMEM): Likewise.

>         (Tmmword): Likewise.

>         (i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.

>         (i386_opcode_modifier): Extend width of fields vexvvvv and sib.

>         (i386_operand_type): Add tmmword.

>         * i386-opc.tbl: Add AMX instructions.

>         * i386-reg.tbl: Add AMX registers.

>         * i386-init.h: Regenerated.

>         * i386-tbl.h: Likewise.


OK.

Thanks.

-- 
H.J.
Jan Beulich June 29, 2020, 8:12 a.m. | #2
On 28.06.2020 09:43, Cui, Lili via Binutils wrote:
> @@ -372,6 +373,9 @@ struct _i386_insn

>      /* Has ZMM register operands.  */

>      bfd_boolean has_regzmm;

>  

> +    /* Has TMM register operands.  */

> +    bfd_boolean has_regtmm;


I'm sad to see widening of this (imo) bad model, but unfortunately
I didn't get around yet to clean this up. However, adding the field
is pointless as it only ever gets set, but never read.

> @@ -1202,6 +1206,12 @@ static const arch_entry cpu_arch[] =

>      CPU_WAITPKG_FLAGS, 0 },

>    { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,

>      CPU_CLDEMOTE_FLAGS, 0 },

> +  { STRING_COMMA_LEN (".amx-int8"), PROCESSOR_UNKNOWN,

> +    CPU_AMX_INT8_FLAGS, 0 },

> +  { STRING_COMMA_LEN (".amx-bf16"), PROCESSOR_UNKNOWN,

> +    CPU_AMX_BF16_FLAGS, 0 },

> +  { STRING_COMMA_LEN (".amx-tile"), PROCESSOR_UNKNOWN,

> +    CPU_AMX_TILE_FLAGS, 0 },


Why (suitably) dashes here but ...

> @@ -1260,6 +1270,9 @@ static const noarch_entry cpu_noarch[] =

>    { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },

>    { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },

>    { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },

> +  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },

> +  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },

> +  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },


... underscores here)

> @@ -7791,12 +7818,22 @@ build_modrm_byte (void)

>       operands, it must be a instruction with VexNDS.  For a

>       instruction with VexNDD, the destination register is encoded

>       in VEX prefix.  If there are 4 register operands, it must be

> -     a instruction with VEX prefix and 3 sources.  */

> +     a instruction with VEX prefix and 3 sources. For instruction

> +     with 3 register operands, the VEXOP3 indicates we are going

> +     to use VEX.vvvv field to encode the third operand, which is

> +     different from the VEXXDS case where VEX.vvvv is normally used

> +     to encode the second operand. To be clear, the second operand

> +     means operand OP2 and the third operand means operand OP3

> +     in below Intel-syntax assembly code:

> +

> +        INST_OP OP1, OP2, OP3

> +   */

>    if (i.mem_operands == 0

>        && ((i.reg_operands == 2

>  	   && i.tm.opcode_modifier.vexvvvv <= VEXXDS)

>  	  || (i.reg_operands == 3

> -	      && i.tm.opcode_modifier.vexvvvv == VEXXDS)

> +	      && (i.tm.opcode_modifier.vexvvvv == VEXXDS

> +		  || i.tm.opcode_modifier.vexvvvv == VEXOP3))


How is this new case different from e.g. BEXTR? If there's none,
I'd like to suggest to avoid introducing a new pseudo-enumerator.

Also below I'm missing some form of adjustment to check_register()
in this patch. %tmm<N> should not be recognized as register names
(which is even more important in "noprefix" mode) when AMX is
disabled, or outside of 64-bit mode.

> --- a/gas/doc/c-i386.texi

> +++ b/gas/doc/c-i386.texi

> @@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,

>  @code{noenqcmd},

>  @code{noserialize},

>  @code{notsxldtrk},

> +@code{amx_int8},

> +@code{noamx_int8},

> +@code{amx_bf16},

> +@code{noamx_bf16},

> +@code{amx_tile},

> +@code{noamx_tile},


There are all underscores here and ...

> @@ -1504,6 +1510,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:

>  @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}

>  @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}

>  @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}

> +@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}


... here, despite the dashes used in gas'es cpu_arch[].

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> @@ -0,0 +1,69 @@

> +#as:

> +#objdump: -d -Mintel

> +#name: x86_64 AMX insns in Intel syntax

> +#source: x86-64-amx.s

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section \.text:

> +

> +0+ <_start>:

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0

> +[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]

> +[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]

> +[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]


Is this (not very intuitive) representation agreed with Microsoft's
MASM team? I ask because I'd prefer it to be visually recognizable
that the effective address is _not_ [<base>+<index>*<scale>] here.
The form I'm planning to use in my own disassembler (at least until
knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or
maybe [<base>,<index>*<scale>]).

The AT&T mode representation isn't as strongly hinting at what the
actual EA is, and hence I think it's fine to stay as is.

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.s

> @@ -0,0 +1,12 @@

> +# Check for SIBMEM operand used in certain AMX instructions

> +

> +    .text

> +_start:

> +    tileloadd (%rip), %tmm1

> +    tileloaddt1 (%rip), %tmm1

> +    tilestored  %tmm1, (%rip)

> +

> +    .intel_syntax noprefix

> +    tileloadd tmm1, [rip]

> +    tileloaddt1 tmm1, [rip]

> +    tilestored  [rip], tmm1


Besides these checks for gas behavior, shouldn't there also be
checks for the disassembler to make sure e.g. SIB-less forms get
output as "(bad)", or the actual arithmetic ops don't get
recognized with e.g. memory operands or wrongly set VEX.L or
VEX.W?

> @@ -544,8 +548,12 @@ enum

>    ymmq_mode,

>    /* 32-byte YMM or 16-byte word operand */

>    ymmxmm_mode,

> +  /* TMM operand */

> +  tmm_mode,

>    /* d_mode in 32bit, q_mode in 64bit mode.  */

>    m_mode,

> +  /* A generic memory operand.  */

> +  void_mode,


I'm not sure of this: I'd rather see the load/store insns follow
the scatter/gather ones as far as Intel syntax operand size
printing goes. That said, I'd be happy though to see the s/g
ones lose there (effectively redundant), all I'm striving for is
consistency.

> @@ -749,6 +757,7 @@ enum

>    REG_VEX_0F72,

>    REG_VEX_0F73,

>    REG_VEX_0FAE,

> +  REG_VEX_W_0_0F3849_P_0_M_3,


This looks misnamed; afaict it should be REG_VEX_0F3849_P_0_W_0_M_3.
Suffixes should appear in decode order, to help the reader both
follow the logic and look up related table entries. Similar naming
issues exist further down as it seems.

> @@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =

>      "CpuAVX512_BITALG" },

>    { "CPU_ANY_AVX512_BF16_FLAGS",

>      "CpuAVX512_BF16" },

> +  { "CPU_ANY_AMX_INT8_FLAGS",

> +    "CpuAMX_INT8" },

> +  { "CPU_ANY_AMX_BF16_FLAGS",

> +    "CpuAMX_BF16" },

> +  { "CPU_ANY_AMX_TILE_FLAGS",

> +    "CpuAMX_TILE" },


Doesn't this need to include the other two CpuAMX_* as well, as
being the base feature?

> --- a/opcodes/i386-opc.tbl

> +++ b/opcodes/i386-opc.tbl

> @@ -52,6 +52,7 @@

>  #define RegXMM Class=RegSIMD|Xmmword

>  #define RegYMM Class=RegSIMD|Ymmword

>  #define RegZMM Class=RegSIMD|Zmmword

> +#define RegTMM Class=RegSIMD|Tmmword

>  

>  #define RegMask Class=RegMask

>  

> @@ -80,6 +81,11 @@

>  #define VexW0 VexW=VEXW0

>  #define VexW1 VexW=VEXW1

>  #define VexWIG VexW=VEXWIG

> +#define VexOP3 VexVVVV=VEXOP3

> +#define VexSIB128 SIB=VECSIB128

> +#define VecSIB256 SIB=VECSIB256

> +#define VecSIB512 SIB=VECSIB512

> +#define Sibmem SIB=SIBMEM


The middle three are present already as of 63112cd67b21, aren't
they.

And VexOP3 (if needed at all, see above) shouldn't go in the middle
here, but be at least separated by blank lines to identify it as a
separate group (to be accompanied by VexNDS and alike down the road).

> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

>  

>  // TSXLDTRK instructions end.

> +

> +// AMX instructions.

> +

> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }


Aren't these lacking Vex128 and VexW0? Same for I think all further
entries below; see also the respective test case remark further up.

For Intel syntax these should allow for "qword ptr".

> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> +

> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }


As per an earlier comment I think for Intel syntax these ought to accept
"dword ptr" on their memory operands.

I'd further suggest Sibmem's #define to include Modrm, rather than having
to spell out the latter upon each use.

Also case you please use consistently upper or lower case of the spelling
of hex numbers?

> +tilerelease, 0, 0x49, 0xc0, 1, CpuAMX_TILE|Cpu64, Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { 0 }


As pointed out to H.J already, this re-introduces an abuse of ImmExt. All
prior abuses had been carefully eliminated.

> --- a/opcodes/i386-reg.tbl

> +++ b/opcodes/i386-reg.tbl

> @@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval

>  zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval

>  zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval

>  zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval

> +// TMM registers for AMX

> +tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval

> +tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval

> +tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval

> +tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval

> +tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval

> +tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval

> +tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval

> +tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval


Is it really the case that there's no way to record use of these
registers in Dwarf or Unwind info?

Jan
Alan Modra via Binutils June 29, 2020, 12:46 p.m. | #3
On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> > @@ -372,6 +373,9 @@ struct _i386_insn

> >      /* Has ZMM register operands.  */

> >      bfd_boolean has_regzmm;

> >

> > +    /* Has TMM register operands.  */

> > +    bfd_boolean has_regtmm;

>

> I'm sad to see widening of this (imo) bad model, but unfortunately

> I didn't get around yet to clean this up. However, adding the field

> is pointless as it only ever gets set, but never read.


I have a patch to use it.

> > @@ -1202,6 +1206,12 @@ static const arch_entry cpu_arch[] =

> >      CPU_WAITPKG_FLAGS, 0 },

> >    { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,

> >      CPU_CLDEMOTE_FLAGS, 0 },

> > +  { STRING_COMMA_LEN (".amx-int8"), PROCESSOR_UNKNOWN,

> > +    CPU_AMX_INT8_FLAGS, 0 },

> > +  { STRING_COMMA_LEN (".amx-bf16"), PROCESSOR_UNKNOWN,

> > +    CPU_AMX_BF16_FLAGS, 0 },

> > +  { STRING_COMMA_LEN (".amx-tile"), PROCESSOR_UNKNOWN,

> > +    CPU_AMX_TILE_FLAGS, 0 },

>

> Why (suitably) dashes here but ...


This should be fixed.

> > @@ -1260,6 +1270,9 @@ static const noarch_entry cpu_noarch[] =

> >    { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },

> >    { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },

> >    { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },

> > +  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },

> > +  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },

> > +  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },

>

> ... underscores here)

>

> > @@ -7791,12 +7818,22 @@ build_modrm_byte (void)

> >       operands, it must be a instruction with VexNDS.  For a

> >       instruction with VexNDD, the destination register is encoded

> >       in VEX prefix.  If there are 4 register operands, it must be

> > -     a instruction with VEX prefix and 3 sources.  */

> > +     a instruction with VEX prefix and 3 sources. For instruction

> > +     with 3 register operands, the VEXOP3 indicates we are going

> > +     to use VEX.vvvv field to encode the third operand, which is

> > +     different from the VEXXDS case where VEX.vvvv is normally used

> > +     to encode the second operand. To be clear, the second operand

> > +     means operand OP2 and the third operand means operand OP3

> > +     in below Intel-syntax assembly code:

> > +

> > +        INST_OP OP1, OP2, OP3

> > +   */

> >    if (i.mem_operands == 0

> >        && ((i.reg_operands == 2

> >          && i.tm.opcode_modifier.vexvvvv <= VEXXDS)

> >         || (i.reg_operands == 3

> > -           && i.tm.opcode_modifier.vexvvvv == VEXXDS)

> > +           && (i.tm.opcode_modifier.vexvvvv == VEXXDS

> > +               || i.tm.opcode_modifier.vexvvvv == VEXOP3))

>

> How is this new case different from e.g. BEXTR? If there's none,

> I'd like to suggest to avoid introducing a new pseudo-enumerator.

>

> Also below I'm missing some form of adjustment to check_register()

> in this patch. %tmm<N> should not be recognized as register names

> (which is even more important in "noprefix" mode) when AMX is

> disabled, or outside of 64-bit mode.


Lili, please take a look.

> > --- a/gas/doc/c-i386.texi

> > +++ b/gas/doc/c-i386.texi

> > @@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,

> >  @code{noenqcmd},

> >  @code{noserialize},

> >  @code{notsxldtrk},

> > +@code{amx_int8},

> > +@code{noamx_int8},

> > +@code{amx_bf16},

> > +@code{noamx_bf16},

> > +@code{amx_tile},

> > +@code{noamx_tile},

>

> There are all underscores here and ...

>

> > @@ -1504,6 +1510,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:

> >  @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}

> >  @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}

> >  @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}

> > +@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}

>

> ... here, despite the dashes used in gas'es cpu_arch[].

>

> > --- /dev/null

> > +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> > @@ -0,0 +1,69 @@

> > +#as:

> > +#objdump: -d -Mintel

> > +#name: x86_64 AMX insns in Intel syntax

> > +#source: x86-64-amx.s

> > +

> > +.*: +file format .*

> > +

> > +

> > +Disassembly of section \.text:

> > +

> > +0+ <_start>:

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

> > +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

> > +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd tmm5,\[ecx\+eiz\*1\]

> > +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

> > +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd tmm1,\[ecx\+edx\*2\]

>

> Is this (not very intuitive) representation agreed with Microsoft's

> MASM team? I ask because I'd prefer it to be visually recognizable

> that the effective address is _not_ [<base>+<index>*<scale>] here.

> The form I'm planning to use in my own disassembler (at least until

> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or

> maybe [<base>,<index>*<scale>]).


Does this comment apply only to AMX instructions?  Are there any
issues with disassembler in general?

> The AT&T mode representation isn't as strongly hinting at what the

> actual EA is, and hence I think it's fine to stay as is.

>

> > --- /dev/null

> > +++ b/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.s

> > @@ -0,0 +1,12 @@

> > +# Check for SIBMEM operand used in certain AMX instructions

> > +

> > +    .text

> > +_start:

> > +    tileloadd (%rip), %tmm1

> > +    tileloaddt1 (%rip), %tmm1

> > +    tilestored  %tmm1, (%rip)

> > +

> > +    .intel_syntax noprefix

> > +    tileloadd tmm1, [rip]

> > +    tileloaddt1 tmm1, [rip]

> > +    tilestored  [rip], tmm1

>

> Besides these checks for gas behavior, shouldn't there also be

> checks for the disassembler to make sure e.g. SIB-less forms get

> output as "(bad)", or the actual arithmetic ops don't get

> recognized with e.g. memory operands or wrongly set VEX.L or

> VEX.W?


Yes, VEX.L and VEX.W should be handled properly.   SIB-less check
needs more changes in disassembler infrastructure.  It can be done
in a separate patch.

> > @@ -544,8 +548,12 @@ enum

> >    ymmq_mode,

> >    /* 32-byte YMM or 16-byte word operand */

> >    ymmxmm_mode,

> > +  /* TMM operand */

> > +  tmm_mode,

> >    /* d_mode in 32bit, q_mode in 64bit mode.  */

> >    m_mode,

> > +  /* A generic memory operand.  */

> > +  void_mode,

>

> I'm not sure of this: I'd rather see the load/store insns follow

> the scatter/gather ones as far as Intel syntax operand size

> printing goes. That said, I'd be happy though to see the s/g

> ones lose there (effectively redundant), all I'm striving for is

> consistency.

>

> > @@ -749,6 +757,7 @@ enum

> >    REG_VEX_0F72,

> >    REG_VEX_0F73,

> >    REG_VEX_0FAE,

> > +  REG_VEX_W_0_0F3849_P_0_M_3,

>

> This looks misnamed; afaict it should be REG_VEX_0F3849_P_0_W_0_M_3.

> Suffixes should appear in decode order, to help the reader both

> follow the logic and look up related table entries. Similar naming

> issues exist further down as it seems.


Lili, please take a look.

> > @@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =

> >      "CpuAVX512_BITALG" },

> >    { "CPU_ANY_AVX512_BF16_FLAGS",

> >      "CpuAVX512_BF16" },

> > +  { "CPU_ANY_AMX_INT8_FLAGS",

> > +    "CpuAMX_INT8" },

> > +  { "CPU_ANY_AMX_BF16_FLAGS",

> > +    "CpuAMX_BF16" },

> > +  { "CPU_ANY_AMX_TILE_FLAGS",

> > +    "CpuAMX_TILE" },

>

> Doesn't this need to include the other two CpuAMX_* as well, as

> being the base feature?


Lili, please take a look.

> > --- a/opcodes/i386-opc.tbl

> > +++ b/opcodes/i386-opc.tbl

> > @@ -52,6 +52,7 @@

> >  #define RegXMM Class=RegSIMD|Xmmword

> >  #define RegYMM Class=RegSIMD|Ymmword

> >  #define RegZMM Class=RegSIMD|Zmmword

> > +#define RegTMM Class=RegSIMD|Tmmword

> >

> >  #define RegMask Class=RegMask

> >

> > @@ -80,6 +81,11 @@

> >  #define VexW0 VexW=VEXW0

> >  #define VexW1 VexW=VEXW1

> >  #define VexWIG VexW=VEXWIG

> > +#define VexOP3 VexVVVV=VEXOP3

> > +#define VexSIB128 SIB=VECSIB128

> > +#define VecSIB256 SIB=VECSIB256

> > +#define VecSIB512 SIB=VECSIB512

> > +#define Sibmem SIB=SIBMEM

>

> The middle three are present already as of 63112cd67b21, aren't

> they.

>

> And VexOP3 (if needed at all, see above) shouldn't go in the middle

> here, but be at least separated by blank lines to identify it as a

> separate group (to be accompanied by VexNDS and alike down the road).


Lili, please take a look.

> > @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

> >  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

> >

> >  // TSXLDTRK instructions end.

> > +

> > +// AMX instructions.

> > +

> > +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> > +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>

> Aren't these lacking Vex128 and VexW0? Same for I think all further

> entries below; see also the respective test case remark further up.

>

> For Intel syntax these should allow for "qword ptr".


I don't think it is correct since these 2 instructions take a
64-memory location.

>

> > +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

> > +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> > +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> > +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> > +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> > +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> > +

> > +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> > +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> > +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

>

> As per an earlier comment I think for Intel syntax these ought to accept

> "dword ptr" on their memory operands.


See above.

> I'd further suggest Sibmem's #define to include Modrm, rather than having

> to spell out the latter upon each use.


Lili, please take a look.

> Also case you please use consistently upper or lower case of the spelling

> of hex numbers?


Lili, please take a look.

> > +tilerelease, 0, 0x49, 0xc0, 1, CpuAMX_TILE|Cpu64, Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { 0 }

>

> As pointed out to H.J already, this re-introduces an abuse of ImmExt. All

> prior abuses had been carefully eliminated.

>

> > --- a/opcodes/i386-reg.tbl

> > +++ b/opcodes/i386-reg.tbl

> > @@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval

> >  zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval

> >  zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval

> >  zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval

> > +// TMM registers for AMX

> > +tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval

> > +tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval

> > +tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval

> > +tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval

> > +tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval

> > +tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval

> > +tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval

> > +tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval

>

> Is it really the case that there's no way to record use of these

> registers in Dwarf or Unwind info?

>


TMM registers don't have a fixed size.  We are working on the psABI
extension.   We can update it later if needed.

Thanks.

-- 
H.J.
Jan Beulich June 29, 2020, 2:48 p.m. | #4
On 29.06.2020 14:46, H.J. Lu wrote:
> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>> --- /dev/null

>>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

>>> @@ -0,0 +1,69 @@

>>> +#as:

>>> +#objdump: -d -Mintel

>>> +#name: x86_64 AMX insns in Intel syntax

>>> +#source: x86-64-amx.s

>>> +

>>> +.*: +file format .*

>>> +

>>> +

>>> +Disassembly of section \.text:

>>> +

>>> +0+ <_start>:

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

>>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd tmm5,\[ecx\+eiz\*1\]

>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd tmm1,\[ecx\+edx\*2\]

>>

>> Is this (not very intuitive) representation agreed with Microsoft's

>> MASM team? I ask because I'd prefer it to be visually recognizable

>> that the effective address is _not_ [<base>+<index>*<scale>] here.

>> The form I'm planning to use in my own disassembler (at least until

>> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or

>> maybe [<base>,<index>*<scale>]).

> 

> Does this comment apply only to AMX instructions?  Are there any

> issues with disassembler in general?


I'm not aware of any others, perhaps beside some of the more strange
MPX insns. But with MPX discontinued I don't think there's much point
worrying about them.

>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

>>>

>>>  // TSXLDTRK instructions end.

>>> +

>>> +// AMX instructions.

>>> +

>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>

>> Aren't these lacking Vex128 and VexW0? Same for I think all further

>> entries below; see also the respective test case remark further up.

>>

>> For Intel syntax these should allow for "qword ptr".

> 

> I don't think it is correct since these 2 instructions take a

> 64-memory location.


Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which
I admit would be kind of ugly/misleading. It still would seem desirable
to have a way to explicitly specify memory operand size here, but I have
no good other suggestion for the moment.

>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>> +

>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

>>

>> As per an earlier comment I think for Intel syntax these ought to accept

>> "dword ptr" on their memory operands.

> 

> See above.


How "see above"? The units copied are, aiui, dwords. All larger blocks
combined from these dwords are dynamically sized, and hence can't be
expressed with a static size specifier. Hence "dword ptr" looks
applicable to me here.

Jan
Alan Modra via Binutils June 29, 2020, 3:16 p.m. | #5
On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 29.06.2020 14:46, H.J. Lu wrote:

> > On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

> >> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> >>> --- /dev/null

> >>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> >>> @@ -0,0 +1,69 @@

> >>> +#as:

> >>> +#objdump: -d -Mintel

> >>> +#name: x86_64 AMX insns in Intel syntax

> >>> +#source: x86-64-amx.s

> >>> +

> >>> +.*: +file format .*

> >>> +

> >>> +

> >>> +Disassembly of section \.text:

> >>> +

> >>> +0+ <_start>:

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

> >>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

> >>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd tmm5,\[ecx\+eiz\*1\]

> >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

> >>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd tmm1,\[ecx\+edx\*2\]

> >>

> >> Is this (not very intuitive) representation agreed with Microsoft's

> >> MASM team? I ask because I'd prefer it to be visually recognizable

> >> that the effective address is _not_ [<base>+<index>*<scale>] here.

> >> The form I'm planning to use in my own disassembler (at least until

> >> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or

> >> maybe [<base>,<index>*<scale>]).

> >

> > Does this comment apply only to AMX instructions?  Are there any

> > issues with disassembler in general?

>

> I'm not aware of any others, perhaps beside some of the more strange

> MPX insns. But with MPX discontinued I don't think there's much point

> worrying about them.


Lili, please take a look.

> >>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

> >>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

> >>>

> >>>  // TSXLDTRK instructions end.

> >>> +

> >>> +// AMX instructions.

> >>> +

> >>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> >>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> >>

> >> Aren't these lacking Vex128 and VexW0? Same for I think all further

> >> entries below; see also the respective test case remark further up.

> >>

> >> For Intel syntax these should allow for "qword ptr".

> >

> > I don't think it is correct since these 2 instructions take a

> > 64-memory location.

>

> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which

> I admit would be kind of ugly/misleading. It still would seem desirable

> to have a way to explicitly specify memory operand size here, but I have

> no good other suggestion for the moment.


We don't do this for other instructions with 64-byte memory location, like
movdir64b.

> >>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

> >>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>> +

> >>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> >>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> >>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

> >>

> >> As per an earlier comment I think for Intel syntax these ought to accept

> >> "dword ptr" on their memory operands.

> >

> > See above.

>

> How "see above"? The units copied are, aiui, dwords. All larger blocks

> combined from these dwords are dynamically sized, and hence can't be

> expressed with a static size specifier. Hence "dword ptr" looks

> applicable to me here.

>


So far, "dword ptr" means a pointer to dword.  But it isn't the case here.
Also dword isn't the basic unit.


-- 
H.J.
Jan Beulich June 29, 2020, 3:22 p.m. | #6
On 29.06.2020 17:16, H.J. Lu wrote:
> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

>> On 29.06.2020 14:46, H.J. Lu wrote:

>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

>>>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

>>>>>

>>>>>  // TSXLDTRK instructions end.

>>>>> +

>>>>> +// AMX instructions.

>>>>> +

>>>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>>>

>>>> Aren't these lacking Vex128 and VexW0? Same for I think all further

>>>> entries below; see also the respective test case remark further up.

>>>>

>>>> For Intel syntax these should allow for "qword ptr".

>>>

>>> I don't think it is correct since these 2 instructions take a

>>> 64-memory location.

>>

>> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which

>> I admit would be kind of ugly/misleading. It still would seem desirable

>> to have a way to explicitly specify memory operand size here, but I have

>> no good other suggestion for the moment.

> 

> We don't do this for other instructions with 64-byte memory location, like

> movdir64b.


Well, yes, I'm aware, but I'm not happy with the situation. Still I
can see why this may not warrant addressing right now.

>>>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

>>>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>> +

>>>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

>>>>

>>>> As per an earlier comment I think for Intel syntax these ought to accept

>>>> "dword ptr" on their memory operands.

>>>

>>> See above.

>>

>> How "see above"? The units copied are, aiui, dwords. All larger blocks

>> combined from these dwords are dynamically sized, and hence can't be

>> expressed with a static size specifier. Hence "dword ptr" looks

>> applicable to me here.

>>

> 

> So far, "dword ptr" means a pointer to dword.  But it isn't the case here.

> Also dword isn't the basic unit.


Is it not? What does the 'd' suffix in tileloadd and tilestored stand for
then?

Jan
Alan Modra via Binutils June 29, 2020, 3:40 p.m. | #7
On Mon, Jun 29, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 29.06.2020 17:16, H.J. Lu wrote:

> > On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

> >> On 29.06.2020 14:46, H.J. Lu wrote:

> >>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> >>>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

> >>>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

> >>>>>

> >>>>>  // TSXLDTRK instructions end.

> >>>>> +

> >>>>> +// AMX instructions.

> >>>>> +

> >>>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> >>>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> >>>>

> >>>> Aren't these lacking Vex128 and VexW0? Same for I think all further

> >>>> entries below; see also the respective test case remark further up.

> >>>>

> >>>> For Intel syntax these should allow for "qword ptr".

> >>>

> >>> I don't think it is correct since these 2 instructions take a

> >>> 64-memory location.

> >>

> >> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which

> >> I admit would be kind of ugly/misleading. It still would seem desirable

> >> to have a way to explicitly specify memory operand size here, but I have

> >> no good other suggestion for the moment.

> >

> > We don't do this for other instructions with 64-byte memory location, like

> > movdir64b.

>

> Well, yes, I'm aware, but I'm not happy with the situation. Still I

> can see why this may not warrant addressing right now.

>

> >>>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

> >>>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>> +

> >>>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> >>>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> >>>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

> >>>>

> >>>> As per an earlier comment I think for Intel syntax these ought to accept

> >>>> "dword ptr" on their memory operands.

> >>>

> >>> See above.

> >>

> >> How "see above"? The units copied are, aiui, dwords. All larger blocks

> >> combined from these dwords are dynamically sized, and hence can't be

> >> expressed with a static size specifier. Hence "dword ptr" looks

> >> applicable to me here.

> >>

> >

> > So far, "dword ptr" means a pointer to dword.  But it isn't the case here.

> > Also dword isn't the basic unit.

>

> Is it not? What does the 'd' suffix in tileloadd and tilestored stand for

> then?

>


I will check.  But AMX spec doesn't have any dword operations in
tileloadd nor tilestored.

-- 
H.J.
Jan Beulich June 30, 2020, 6:29 a.m. | #8
On 28.06.2020 09:43, Cui, Lili via Binutils wrote:
> @@ -3153,6 +3201,16 @@ static const char *att_names_zmm[] = {

>    "%zmm28", "%zmm29", "%zmm30", "%zmm31"

>  };

>  

> +static const char **names_tmm;

> +static const char *intel_names_tmm[] = {

> +  "tmm0", "tmm1", "tmm2", "tmm3",

> +  "tmm4", "tmm5", "tmm6", "tmm7"

> +};

> +static const char *att_names_tmm[] = {

> +  "%tmm0", "%tmm1", "%tmm2", "%tmm3",

> +  "%tmm4", "%tmm5", "%tmm6", "%tmm7"

> +};


Upon further consideration I don't think this and ...

> --- a/opcodes/i386-reg.tbl

> +++ b/opcodes/i386-reg.tbl

> @@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval

>  zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval

>  zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval

>  zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval

> +// TMM registers for AMX

> +tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval

> +tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval

> +tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval

> +tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval

> +tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval

> +tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval

> +tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval

> +tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval


... this is quite sufficient: How many registers there are depends on
the selected palette, and I don't think gas should needlessly restrict
encoding options. The disassembler needs to handle the high bits of
the register encoding fields in any event - whether by properly
decoding the high bits or by properly considering the encodings as
(bad) is secondary there (but should of course be in line with the
choice on the assembler side).

Jan
Alan Modra via Binutils June 30, 2020, 9:12 a.m. | #9
> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

> >

> > On 29.06.2020 14:46, H.J. Lu wrote:

> > > On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

> > >> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> > >>> --- /dev/null

> > >>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> > >>> @@ -0,0 +1,69 @@

> > >>> +#as:

> > >>> +#objdump: -d -Mintel

> > >>> +#name: x86_64 AMX insns in Intel syntax

> > >>> +#source: x86-64-amx.s

> > >>> +

> > >>> +.*: +file format .*

> > >>> +

> > >>> +

> > >>> +Disassembly of section \.text:

> > >>> +

> > >>> +0+ <_start>:

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

> > >>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

> > >>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd

> tmm5,\[ecx\+eiz\*1\]

> > >>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

> > >>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd

> tmm1,\[ecx\+edx\*2\]

> > >>

> > >> Is this (not very intuitive) representation agreed with Microsoft's

> > >> MASM team? I ask because I'd prefer it to be visually recognizable

> > >> that the effective address is _not_ [<base>+<index>*<scale>] here.

> > >> The form I'm planning to use in my own disassembler (at least until

> > >> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or maybe

> > >> [<base>,<index>*<scale>]).

> > >

> > > Does this comment apply only to AMX instructions?  Are there any

> > > issues with disassembler in general?

> >

> > I'm not aware of any others, perhaps beside some of the more strange

> > MPX insns. But with MPX discontinued I don't think there's much point

> > worrying about them.

> 

> Lili, please take a look.


Hi Jan, 

Could you help figure out which format I should use? and what's the "n" meaning it the second format, Thanks.
[<base>+<index>*<scale>]
[<base>+<index>*<scale>n]
[<base>,<index>*<scale>]

> > @@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =

> >      "CpuAVX512_BITALG" },

> >    { "CPU_ANY_AVX512_BF16_FLAGS",

> >      "CpuAVX512_BF16" },

> > +  { "CPU_ANY_AMX_INT8_FLAGS",

> > +    "CpuAMX_INT8" },

> > +  { "CPU_ANY_AMX_BF16_FLAGS",

> > +    "CpuAMX_BF16" },

> > +  { "CPU_ANY_AMX_TILE_FLAGS",

> > +    "CpuAMX_TILE" },

>

> Doesn't this need to include the other two CpuAMX_* as well, as being 

> the base feature?


Sorry, I didn't get what do you mean.

Thanks,
Lili.
Jan Beulich June 30, 2020, 9:33 a.m. | #10
On 30.06.2020 11:12, Cui, Lili wrote:
>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>

>>> On 29.06.2020 14:46, H.J. Lu wrote:

>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>>>>> --- /dev/null

>>>>>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

>>>>>> @@ -0,0 +1,69 @@

>>>>>> +#as:

>>>>>> +#objdump: -d -Mintel

>>>>>> +#name: x86_64 AMX insns in Intel syntax

>>>>>> +#source: x86-64-amx.s

>>>>>> +

>>>>>> +.*: +file format .*

>>>>>> +

>>>>>> +

>>>>>> +Disassembly of section \.text:

>>>>>> +

>>>>>> +0+ <_start>:

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

>>>>>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

>>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd

>> tmm5,\[ecx\+eiz\*1\]

>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

>>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd

>> tmm1,\[ecx\+edx\*2\]

>>>>>

>>>>> Is this (not very intuitive) representation agreed with Microsoft's

>>>>> MASM team? I ask because I'd prefer it to be visually recognizable

>>>>> that the effective address is _not_ [<base>+<index>*<scale>] here.

>>>>> The form I'm planning to use in my own disassembler (at least until

>>>>> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or maybe

>>>>> [<base>,<index>*<scale>]).

>>>>

>>>> Does this comment apply only to AMX instructions?  Are there any

>>>> issues with disassembler in general?

>>>

>>> I'm not aware of any others, perhaps beside some of the more strange

>>> MPX insns. But with MPX discontinued I don't think there's much point

>>> worrying about them.

>>

>> Lili, please take a look.

> 

> Hi Jan, 

> 

> Could you help figure out which format I should use? and what's the "n" meaning it the second format, Thanks.

> [<base>+<index>*<scale>]

> [<base>+<index>*<scale>n]

> [<base>,<index>*<scale>]


See my original question: "Is this (not very intuitive) representation
agreed with Microsoft's MASM team?" I have no contacts there, but I
would be assuming you (Intel) have.

The 'n' was meant to be a literal character 'n', standing for what
the operation sections of the insns call "start" (while "stride" is
<index>*<scale>).

>>> @@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =

>>>      "CpuAVX512_BITALG" },

>>>    { "CPU_ANY_AVX512_BF16_FLAGS",

>>>      "CpuAVX512_BF16" },

>>> +  { "CPU_ANY_AMX_INT8_FLAGS",

>>> +    "CpuAMX_INT8" },

>>> +  { "CPU_ANY_AMX_BF16_FLAGS",

>>> +    "CpuAMX_BF16" },

>>> +  { "CPU_ANY_AMX_TILE_FLAGS",

>>> +    "CpuAMX_TILE" },

>>

>> Doesn't this need to include the other two CpuAMX_* as well, as being 

>> the base feature?

> 

> Sorry, I didn't get what do you mean.


Just like e.g.

  { "CPU_ANY_AVX512F_FLAGS",
    "CpuAVX512F|CpuAVX512CD|CpuAVX512ER|CpuAVX512PF|CpuAVX512DQ|CpuAVX512BW|CpuAVX512VL|CpuAVX512IFMA|CpuAVX512VBMI|CpuAVX512_4FMAPS|CpuAVX512_4VNNIW|CpuAVX512_VPOPCNTDQ|CpuAVX512_VBMI2|CpuAVX512_VNNI|CpuAVX512_BITALG|CpuAVX512_BF16|CpuAVX512_VP2INTERSECT" },

results in all dependent features to get disabled when AVX512F gets disabled,
I think disabling AMX-TILE should also lead to disabling of AMX-INT8 and
AMX-BF16.

Jan
Jan Beulich June 30, 2020, 9:48 a.m. | #11
On 29.06.2020 17:40, H.J. Lu wrote:
> On Mon, Jun 29, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 29.06.2020 17:16, H.J. Lu wrote:

>>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>> On 29.06.2020 14:46, H.J. Lu wrote:

>>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>>>>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

>>>>>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

>>>>>>>

>>>>>>>  // TSXLDTRK instructions end.

>>>>>>> +

>>>>>>> +// AMX instructions.

>>>>>>> +

>>>>>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>>>>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>>>>>

>>>>>> Aren't these lacking Vex128 and VexW0? Same for I think all further

>>>>>> entries below; see also the respective test case remark further up.

>>>>>>

>>>>>> For Intel syntax these should allow for "qword ptr".

>>>>>

>>>>> I don't think it is correct since these 2 instructions take a

>>>>> 64-memory location.

>>>>

>>>> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which

>>>> I admit would be kind of ugly/misleading. It still would seem desirable

>>>> to have a way to explicitly specify memory operand size here, but I have

>>>> no good other suggestion for the moment.

>>>

>>> We don't do this for other instructions with 64-byte memory location, like

>>> movdir64b.

>>

>> Well, yes, I'm aware, but I'm not happy with the situation. Still I

>> can see why this may not warrant addressing right now.

>>

>>>>>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

>>>>>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>> +

>>>>>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>>>>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>>>>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

>>>>>>

>>>>>> As per an earlier comment I think for Intel syntax these ought to accept

>>>>>> "dword ptr" on their memory operands.

>>>>>

>>>>> See above.

>>>>

>>>> How "see above"? The units copied are, aiui, dwords. All larger blocks

>>>> combined from these dwords are dynamically sized, and hence can't be

>>>> expressed with a static size specifier. Hence "dword ptr" looks

>>>> applicable to me here.

>>>>

>>>

>>> So far, "dword ptr" means a pointer to dword.  But it isn't the case here.

>>> Also dword isn't the basic unit.

>>

>> Is it not? What does the 'd' suffix in tileloadd and tilestored stand for

>> then?

> 

> I will check.  But AMX spec doesn't have any dword operations in

> tileloadd nor tilestored.


It's not very explicit, but the exception section has various mentions
along the lines of "#UD if tsrc.colbytes mod 4 != 0". I.e. while
arithmetic happens on int8 / bf16 units, organization is still in
dword granularity. Also see the description of the dot product insns,
which describe how "each dword" gets interpreted by these insns.

Jan
Alan Modra via Binutils June 30, 2020, 12:20 p.m. | #12
On Tue, Jun 30, 2020 at 2:48 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 29.06.2020 17:40, H.J. Lu wrote:

> > On Mon, Jun 29, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 29.06.2020 17:16, H.J. Lu wrote:

> >>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>> On 29.06.2020 14:46, H.J. Lu wrote:

> >>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> >>>>>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

> >>>>>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

> >>>>>>>

> >>>>>>>  // TSXLDTRK instructions end.

> >>>>>>> +

> >>>>>>> +// AMX instructions.

> >>>>>>> +

> >>>>>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> >>>>>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

> >>>>>>

> >>>>>> Aren't these lacking Vex128 and VexW0? Same for I think all further

> >>>>>> entries below; see also the respective test case remark further up.

> >>>>>>

> >>>>>> For Intel syntax these should allow for "qword ptr".

> >>>>>

> >>>>> I don't think it is correct since these 2 instructions take a

> >>>>> 64-memory location.

> >>>>

> >>>> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which

> >>>> I admit would be kind of ugly/misleading. It still would seem desirable

> >>>> to have a way to explicitly specify memory operand size here, but I have

> >>>> no good other suggestion for the moment.

> >>>

> >>> We don't do this for other instructions with 64-byte memory location, like

> >>> movdir64b.

> >>

> >> Well, yes, I'm aware, but I'm not happy with the situation. Still I

> >> can see why this may not warrant addressing right now.

> >>

> >>>>>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

> >>>>>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

> >>>>>>> +

> >>>>>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> >>>>>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

> >>>>>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

> >>>>>>

> >>>>>> As per an earlier comment I think for Intel syntax these ought to accept

> >>>>>> "dword ptr" on their memory operands.

> >>>>>

> >>>>> See above.

> >>>>

> >>>> How "see above"? The units copied are, aiui, dwords. All larger blocks

> >>>> combined from these dwords are dynamically sized, and hence can't be

> >>>> expressed with a static size specifier. Hence "dword ptr" looks

> >>>> applicable to me here.

> >>>>

> >>>

> >>> So far, "dword ptr" means a pointer to dword.  But it isn't the case here.

> >>> Also dword isn't the basic unit.

> >>

> >> Is it not? What does the 'd' suffix in tileloadd and tilestored stand for

> >> then?

> >

> > I will check.  But AMX spec doesn't have any dword operations in

> > tileloadd nor tilestored.

>

> It's not very explicit, but the exception section has various mentions

> along the lines of "#UD if tsrc.colbytes mod 4 != 0". I.e. while

> arithmetic happens on int8 / bf16 units, organization is still in

> dword granularity. Also see the description of the dot product insns,

> which describe how "each dword" gets interpreted by these insns.

>


But we don't use "dword ptr" on vector instructions with dword granularity.

-- 
H.J.
Jan Beulich June 30, 2020, 12:25 p.m. | #13
On 30.06.2020 14:20, H.J. Lu wrote:
> On Tue, Jun 30, 2020 at 2:48 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 29.06.2020 17:40, H.J. Lu wrote:

>>> On Mon, Jun 29, 2020 at 8:22 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>

>>>> On 29.06.2020 17:16, H.J. Lu wrote:

>>>>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>> On 29.06.2020 14:46, H.J. Lu wrote:

>>>>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>>>>>>>> @@ -4093,3 +4099,25 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|

>>>>>>>>>  xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }

>>>>>>>>>

>>>>>>>>>  // TSXLDTRK instructions end.

>>>>>>>>> +

>>>>>>>>> +// AMX instructions.

>>>>>>>>> +

>>>>>>>>> +ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>>>>>>>> +sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }

>>>>>>>>

>>>>>>>> Aren't these lacking Vex128 and VexW0? Same for I think all further

>>>>>>>> entries below; see also the respective test case remark further up.

>>>>>>>>

>>>>>>>> For Intel syntax these should allow for "qword ptr".

>>>>>>>

>>>>>>> I don't think it is correct since these 2 instructions take a

>>>>>>> 64-memory location.

>>>>>>

>>>>>> Oh, sorry for mixing bits and bytes. Should be "zmmword ptr" then, which

>>>>>> I admit would be kind of ugly/misleading. It still would seem desirable

>>>>>> to have a way to explicitly specify memory operand size here, but I have

>>>>>> no good other suggestion for the moment.

>>>>>

>>>>> We don't do this for other instructions with 64-byte memory location, like

>>>>> movdir64b.

>>>>

>>>> Well, yes, I'm aware, but I'm not happy with the situation. Still I

>>>> can see why this may not warrant addressing right now.

>>>>

>>>>>>>>> +// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.

>>>>>>>>> +tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>>>> +tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>>>> +tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>>>> +tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>>>> +tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }

>>>>>>>>> +

>>>>>>>>> +tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>>>>>>>> +tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }

>>>>>>>>> +tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }

>>>>>>>>

>>>>>>>> As per an earlier comment I think for Intel syntax these ought to accept

>>>>>>>> "dword ptr" on their memory operands.

>>>>>>>

>>>>>>> See above.

>>>>>>

>>>>>> How "see above"? The units copied are, aiui, dwords. All larger blocks

>>>>>> combined from these dwords are dynamically sized, and hence can't be

>>>>>> expressed with a static size specifier. Hence "dword ptr" looks

>>>>>> applicable to me here.

>>>>>>

>>>>>

>>>>> So far, "dword ptr" means a pointer to dword.  But it isn't the case here.

>>>>> Also dword isn't the basic unit.

>>>>

>>>> Is it not? What does the 'd' suffix in tileloadd and tilestored stand for

>>>> then?

>>>

>>> I will check.  But AMX spec doesn't have any dword operations in

>>> tileloadd nor tilestored.

>>

>> It's not very explicit, but the exception section has various mentions

>> along the lines of "#UD if tsrc.colbytes mod 4 != 0". I.e. while

>> arithmetic happens on int8 / bf16 units, organization is still in

>> dword granularity. Also see the description of the dot product insns,

>> which describe how "each dword" gets interpreted by these insns.

> 

> But we don't use "dword ptr" on vector instructions with dword granularity.


Which is why rather I compared it to the scatter/gather subset, where we
do. No larger granularity of types is known at build time, so this is as
good as it can get. But I'm not going to insist, especially as long as I
don't know what MASM will permit / refuse.

Jan
Alan Modra via Binutils June 30, 2020, 12:26 p.m. | #14
On Mon, Jun 29, 2020 at 11:29 PM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> > @@ -3153,6 +3201,16 @@ static const char *att_names_zmm[] = {

> >    "%zmm28", "%zmm29", "%zmm30", "%zmm31"

> >  };

> >

> > +static const char **names_tmm;

> > +static const char *intel_names_tmm[] = {

> > +  "tmm0", "tmm1", "tmm2", "tmm3",

> > +  "tmm4", "tmm5", "tmm6", "tmm7"

> > +};

> > +static const char *att_names_tmm[] = {

> > +  "%tmm0", "%tmm1", "%tmm2", "%tmm3",

> > +  "%tmm4", "%tmm5", "%tmm6", "%tmm7"

> > +};

>

> Upon further consideration I don't think this and ...

>

> > --- a/opcodes/i386-reg.tbl

> > +++ b/opcodes/i386-reg.tbl

> > @@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval

> >  zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval

> >  zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval

> >  zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval

> > +// TMM registers for AMX

> > +tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval

> > +tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval

> > +tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval

> > +tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval

> > +tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval

> > +tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval

> > +tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval

> > +tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval

>

> ... this is quite sufficient: How many registers there are depends on

> the selected palette, and I don't think gas should needlessly restrict

> encoding options. The disassembler needs to handle the high bits of

> the register encoding fields in any event - whether by properly

> decoding the high bits or by properly considering the encodings as

> (bad) is secondary there (but should of course be in line with the

> choice on the assembler side).

>


Disassembler should check the high bits of the register encoding fields.
But i386-reg.tbl is used by assembler.   I don't see anything wrong with
tmm in i386-reg.tbl.

-- 
H.J.
Alan Modra via Binutils June 30, 2020, 12:31 p.m. | #15
On Tue, Jun 30, 2020 at 2:33 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 30.06.2020 11:12, Cui, Lili wrote:

> >> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>

> >>> On 29.06.2020 14:46, H.J. Lu wrote:

> >>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> >>>>>> --- /dev/null

> >>>>>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> >>>>>> @@ -0,0 +1,69 @@

> >>>>>> +#as:

> >>>>>> +#objdump: -d -Mintel

> >>>>>> +#name: x86_64 AMX insns in Intel syntax

> >>>>>> +#source: x86-64-amx.s

> >>>>>> +

> >>>>>> +.*: +file format .*

> >>>>>> +

> >>>>>> +

> >>>>>> +Disassembly of section \.text:

> >>>>>> +

> >>>>>> +0+ <_start>:

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

> >>>>>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

> >>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd

> >> tmm5,\[ecx\+eiz\*1\]

> >>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

> >>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd

> >> tmm1,\[ecx\+edx\*2\]

> >>>>>

> >>>>> Is this (not very intuitive) representation agreed with Microsoft's

> >>>>> MASM team? I ask because I'd prefer it to be visually recognizable

> >>>>> that the effective address is _not_ [<base>+<index>*<scale>] here.

> >>>>> The form I'm planning to use in my own disassembler (at least until

> >>>>> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or maybe

> >>>>> [<base>,<index>*<scale>]).

> >>>>

> >>>> Does this comment apply only to AMX instructions?  Are there any

> >>>> issues with disassembler in general?

> >>>

> >>> I'm not aware of any others, perhaps beside some of the more strange

> >>> MPX insns. But with MPX discontinued I don't think there's much point

> >>> worrying about them.

> >>

> >> Lili, please take a look.

> >

> > Hi Jan,

> >

> > Could you help figure out which format I should use? and what's the "n" meaning it the second format, Thanks.

> > [<base>+<index>*<scale>]

> > [<base>+<index>*<scale>n]

> > [<base>,<index>*<scale>]

>

> See my original question: "Is this (not very intuitive) representation

> agreed with Microsoft's MASM team?" I have no contacts there, but I

> would be assuming you (Intel) have.


MASM doesn't always support new ISAs.

> The 'n' was meant to be a literal character 'n', standing for what

> the operation sections of the insns call "start" (while "stride" is

> <index>*<scale>).

>


We don't invent new assembly syntax for AMX.  Disassembler should match
what assembler accepts.

-- 
H.J.
Jan Beulich June 30, 2020, 4:29 p.m. | #16
On 30.06.2020 14:31, H.J. Lu wrote:
> On Tue, Jun 30, 2020 at 2:33 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 30.06.2020 11:12, Cui, Lili wrote:

>>>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>

>>>>> On 29.06.2020 14:46, H.J. Lu wrote:

>>>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>>>>>>> --- /dev/null

>>>>>>>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

>>>>>>>> @@ -0,0 +1,69 @@

>>>>>>>> +#as:

>>>>>>>> +#objdump: -d -Mintel

>>>>>>>> +#name: x86_64 AMX insns in Intel syntax

>>>>>>>> +#source: x86-64-amx.s

>>>>>>>> +

>>>>>>>> +.*: +file format .*

>>>>>>>> +

>>>>>>>> +

>>>>>>>> +Disassembly of section \.text:

>>>>>>>> +

>>>>>>>> +0+ <_start>:

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd

>>>> tmm5,\[ecx\+eiz\*1\]

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

>>>>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd

>>>> tmm1,\[ecx\+edx\*2\]

>>>>>>>

>>>>>>> Is this (not very intuitive) representation agreed with Microsoft's

>>>>>>> MASM team? I ask because I'd prefer it to be visually recognizable

>>>>>>> that the effective address is _not_ [<base>+<index>*<scale>] here.

>>>>>>> The form I'm planning to use in my own disassembler (at least until

>>>>>>> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or maybe

>>>>>>> [<base>,<index>*<scale>]).

>>>>>>

>>>>>> Does this comment apply only to AMX instructions?  Are there any

>>>>>> issues with disassembler in general?

>>>>>

>>>>> I'm not aware of any others, perhaps beside some of the more strange

>>>>> MPX insns. But with MPX discontinued I don't think there's much point

>>>>> worrying about them.

>>>>

>>>> Lili, please take a look.

>>>

>>> Hi Jan,

>>>

>>> Could you help figure out which format I should use? and what's the "n" meaning it the second format, Thanks.

>>> [<base>+<index>*<scale>]

>>> [<base>+<index>*<scale>n]

>>> [<base>,<index>*<scale>]

>>

>> See my original question: "Is this (not very intuitive) representation

>> agreed with Microsoft's MASM team?" I have no contacts there, but I

>> would be assuming you (Intel) have.

> 

> MASM doesn't always support new ISAs.


They may not support it right away, but I suppose eventually they
will.

>> The 'n' was meant to be a literal character 'n', standing for what

>> the operation sections of the insns call "start" (while "stride" is

>> <index>*<scale>).

>>

> 

> We don't invent new assembly syntax for AMX.  Disassembler should match

> what assembler accepts.


Of course - I merely pointed out this aspect at this place. As to
"inventing" - I didn't suggest we do. Instead I suggested checking
with MASM folks, short of the ISA extensions document not suggesting
any syntax.

Jan
Jan Beulich June 30, 2020, 4:32 p.m. | #17
On 30.06.2020 14:26, H.J. Lu wrote:
> On Mon, Jun 29, 2020 at 11:29 PM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

>>> @@ -3153,6 +3201,16 @@ static const char *att_names_zmm[] = {

>>>    "%zmm28", "%zmm29", "%zmm30", "%zmm31"

>>>  };

>>>

>>> +static const char **names_tmm;

>>> +static const char *intel_names_tmm[] = {

>>> +  "tmm0", "tmm1", "tmm2", "tmm3",

>>> +  "tmm4", "tmm5", "tmm6", "tmm7"

>>> +};

>>> +static const char *att_names_tmm[] = {

>>> +  "%tmm0", "%tmm1", "%tmm2", "%tmm3",

>>> +  "%tmm4", "%tmm5", "%tmm6", "%tmm7"

>>> +};

>>

>> Upon further consideration I don't think this and ...

>>

>>> --- a/opcodes/i386-reg.tbl

>>> +++ b/opcodes/i386-reg.tbl

>>> @@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval

>>>  zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval

>>>  zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval

>>>  zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval

>>> +// TMM registers for AMX

>>> +tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval

>>> +tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval

>>> +tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval

>>> +tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval

>>> +tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval

>>> +tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval

>>> +tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval

>>> +tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval

>>

>> ... this is quite sufficient: How many registers there are depends on

>> the selected palette, and I don't think gas should needlessly restrict

>> encoding options. The disassembler needs to handle the high bits of

>> the register encoding fields in any event - whether by properly

>> decoding the high bits or by properly considering the encodings as

>> (bad) is secondary there (but should of course be in line with the

>> choice on the assembler side).

>>

> 

> Disassembler should check the high bits of the register encoding fields.

> But i386-reg.tbl is used by assembler.   I don't see anything wrong with

> tmm in i386-reg.tbl.


The specification does not restrict %tmm to 0-7; it's merely
palette 1 which does (and the tilecfg register is nothing the
assembler can know the state of). The assembler imo should
support all 16 encodable registers right away. And of course
the disassembler then should follow suit.

Jan
Alan Modra via Binutils June 30, 2020, 4:35 p.m. | #18
On Tue, Jun 30, 2020 at 9:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 30.06.2020 14:26, H.J. Lu wrote:

> > On Mon, Jun 29, 2020 at 11:29 PM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> >>> @@ -3153,6 +3201,16 @@ static const char *att_names_zmm[] = {

> >>>    "%zmm28", "%zmm29", "%zmm30", "%zmm31"

> >>>  };

> >>>

> >>> +static const char **names_tmm;

> >>> +static const char *intel_names_tmm[] = {

> >>> +  "tmm0", "tmm1", "tmm2", "tmm3",

> >>> +  "tmm4", "tmm5", "tmm6", "tmm7"

> >>> +};

> >>> +static const char *att_names_tmm[] = {

> >>> +  "%tmm0", "%tmm1", "%tmm2", "%tmm3",

> >>> +  "%tmm4", "%tmm5", "%tmm6", "%tmm7"

> >>> +};

> >>

> >> Upon further consideration I don't think this and ...

> >>

> >>> --- a/opcodes/i386-reg.tbl

> >>> +++ b/opcodes/i386-reg.tbl

> >>> @@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval

> >>>  zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval

> >>>  zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval

> >>>  zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval

> >>> +// TMM registers for AMX

> >>> +tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval

> >>> +tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval

> >>> +tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval

> >>> +tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval

> >>> +tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval

> >>> +tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval

> >>> +tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval

> >>> +tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval

> >>

> >> ... this is quite sufficient: How many registers there are depends on

> >> the selected palette, and I don't think gas should needlessly restrict

> >> encoding options. The disassembler needs to handle the high bits of

> >> the register encoding fields in any event - whether by properly

> >> decoding the high bits or by properly considering the encodings as

> >> (bad) is secondary there (but should of course be in line with the

> >> choice on the assembler side).

> >>

> >

> > Disassembler should check the high bits of the register encoding fields.

> > But i386-reg.tbl is used by assembler.   I don't see anything wrong with

> > tmm in i386-reg.tbl.

>

> The specification does not restrict %tmm to 0-7; it's merely

> palette 1 which does (and the tilecfg register is nothing the

> assembler can know the state of). The assembler imo should

> support all 16 encodable registers right away. And of course

> the disassembler then should follow suit.

>


Sounds reasonable.   Lili, please add tmm8-15.

-- 
H.J.
Alan Modra via Binutils June 30, 2020, 4:36 p.m. | #19
On Tue, Jun 30, 2020 at 9:29 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 30.06.2020 14:31, H.J. Lu wrote:

> > On Tue, Jun 30, 2020 at 2:33 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 30.06.2020 11:12, Cui, Lili wrote:

> >>>> On Mon, Jun 29, 2020 at 7:48 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>

> >>>>> On 29.06.2020 14:46, H.J. Lu wrote:

> >>>>>> On Mon, Jun 29, 2020 at 3:03 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>>> On 28.06.2020 09:43, Cui, Lili via Binutils wrote:

> >>>>>>>> --- /dev/null

> >>>>>>>> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> >>>>>>>> @@ -0,0 +1,69 @@

> >>>>>>>> +#as:

> >>>>>>>> +#objdump: -d -Mintel

> >>>>>>>> +#name: x86_64 AMX insns in Intel syntax

> >>>>>>>> +#source: x86-64-amx.s

> >>>>>>>> +

> >>>>>>>> +.*: +file format .*

> >>>>>>>> +

> >>>>>>>> +

> >>>>>>>> +Disassembly of section \.text:

> >>>>>>>> +

> >>>>>>>> +0+ <_start>:

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 78 49 04 51[    ]*ldtilecfg \[rcx\+rdx\*2\]

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 79 49 04 51[    ]*sttilecfg \[rcx\+rdx\*2\]

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 52 5c dc[       ]*tdpbf16ps tmm3,tmm4,tmm5

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 63 5e ca[       ]*tdpbssd tmm1,tmm2,tmm3

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 62 5e ca[       ]*tdpbsud tmm1,tmm2,tmm3

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 61 5e ca[       ]*tdpbusd tmm1,tmm2,tmm3

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 60 5e ca[       ]*tdpbuud tmm1,tmm2,tmm3

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 25 00[         ]*tileloadd tmm5,ds:0x0

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*00 00 00[     ]*

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 21[    ]*tileloadd tmm5,\[rcx\+riz\*1\]

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 2c 21[         ]*tileloadd

> >>>> tmm5,\[ecx\+eiz\*1\]

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*c4 e2 7b 4b 2c 11[    ]*tileloadd tmm5,\[rcx\+rdx\*1\]

> >>>>>>>> +[    ]*[a-f0-9]+:[   ]*67 c4 e2 7b 4b 0c 51[         ]*tileloadd

> >>>> tmm1,\[ecx\+edx\*2\]

> >>>>>>>

> >>>>>>> Is this (not very intuitive) representation agreed with Microsoft's

> >>>>>>> MASM team? I ask because I'd prefer it to be visually recognizable

> >>>>>>> that the effective address is _not_ [<base>+<index>*<scale>] here.

> >>>>>>> The form I'm planning to use in my own disassembler (at least until

> >>>>>>> knowing otherwise for MASM) is [<base>+<index>*<scale>n] (or maybe

> >>>>>>> [<base>,<index>*<scale>]).

> >>>>>>

> >>>>>> Does this comment apply only to AMX instructions?  Are there any

> >>>>>> issues with disassembler in general?

> >>>>>

> >>>>> I'm not aware of any others, perhaps beside some of the more strange

> >>>>> MPX insns. But with MPX discontinued I don't think there's much point

> >>>>> worrying about them.

> >>>>

> >>>> Lili, please take a look.

> >>>

> >>> Hi Jan,

> >>>

> >>> Could you help figure out which format I should use? and what's the "n" meaning it the second format, Thanks.

> >>> [<base>+<index>*<scale>]

> >>> [<base>+<index>*<scale>n]

> >>> [<base>,<index>*<scale>]

> >>

> >> See my original question: "Is this (not very intuitive) representation

> >> agreed with Microsoft's MASM team?" I have no contacts there, but I

> >> would be assuming you (Intel) have.

> >

> > MASM doesn't always support new ISAs.

>

> They may not support it right away, but I suppose eventually they

> will.

>

> >> The 'n' was meant to be a literal character 'n', standing for what

> >> the operation sections of the insns call "start" (while "stride" is

> >> <index>*<scale>).

> >>

> >

> > We don't invent new assembly syntax for AMX.  Disassembler should match

> > what assembler accepts.

>

> Of course - I merely pointed out this aspect at this place. As to

> "inventing" - I didn't suggest we do. Instead I suggested checking

> with MASM folks, short of the ISA extensions document not suggesting

> any syntax.

>


Let's go with what we have and change it to match MASM later.


-- 
H.J.
Jan Beulich July 1, 2020, 10:02 a.m. | #20
On 30.06.2020 18:36, H.J. Lu wrote:
> Let's go with what we have and change it to match MASM later.


Well, at the risk of stating the obvious: We then won't be able to
later drop the support for what we currently support, at least not
easily. But anyway ...

Jan
Alan Modra via Binutils July 1, 2020, 2:40 p.m. | #21
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Wednesday, July 1, 2020 6:02 PM

> To: H.J. Lu <hjl.tools@gmail.com>

> Cc: Cui, Lili <lili.cui@intel.com>; binutils@sourceware.org

> Subject: Re: x86: Add support for Intel AMX instructions

> 

> On 30.06.2020 18:36, H.J. Lu wrote:

> > Let's go with what we have and change it to match MASM later.

> 

> Well, at the risk of stating the obvious: We then won't be able to later drop

> the support for what we currently support, at least not easily. But anyway ...

> 

> Jan


Hi Jan,

Thank you for your careful inspection and suggestions, I update the following modifications in the attachment patch.

1.  Delete the dashes, for example use "amx_tile" instead of "amx-tile".
2.  Add TMM register name check when AMX is disabled, or outside of 64-bit mode to check_register()
3.  Modify AMX name to follow suffixes appear in decode order in disassembler.
4. When disable CpuAMX_TILE, other two CpuAMX_* should also be disabled, as amx_tile is the base feature.
5.  Add Modrm to Sibmem, #define Sibmem SIB=SIBMEM|Modrm 
6.  Change "0xf34B"  to "0xf34b"
7.  Change" tilerelease" opcode in to 2 byte and have a value of 0x49c0.
8.  Add VEX128 and VexW0 to all AMX instructions.


Thanks,
Lili.
Alan Modra via Binutils July 2, 2020, 1:24 a.m. | #22
On Wed, Jul 1, 2020 at 7:40 AM Cui, Lili <lili.cui@intel.com> wrote:
>

> > -----Original Message-----

> > From: Jan Beulich <jbeulich@suse.com>

> > Sent: Wednesday, July 1, 2020 6:02 PM

> > To: H.J. Lu <hjl.tools@gmail.com>

> > Cc: Cui, Lili <lili.cui@intel.com>; binutils@sourceware.org

> > Subject: Re: x86: Add support for Intel AMX instructions

> >

> > On 30.06.2020 18:36, H.J. Lu wrote:

> > > Let's go with what we have and change it to match MASM later.

> >

> > Well, at the risk of stating the obvious: We then won't be able to later drop

> > the support for what we currently support, at least not easily. But anyway ...

> >

> > Jan

>

> Hi Jan,

>

> Thank you for your careful inspection and suggestions, I update the following modifications in the attachment patch.

>

> 1.  Delete the dashes, for example use "amx_tile" instead of "amx-tile".

> 2.  Add TMM register name check when AMX is disabled, or outside of 64-bit mode to check_register()

> 3.  Modify AMX name to follow suffixes appear in decode order in disassembler.

> 4. When disable CpuAMX_TILE, other two CpuAMX_* should also be disabled, as amx_tile is the base feature.

> 5.  Add Modrm to Sibmem, #define Sibmem SIB=SIBMEM|Modrm

> 6.  Change "0xf34B"  to "0xf34b"

> 7.  Change" tilerelease" opcode in to 2 byte and have a value of 0x49c0.

> 8.  Add VEX128 and VexW0 to all AMX instructions.

>


It is OK.  Please give Jan some time to review.

Thanks.

-- 
H.J.
Jan Beulich July 2, 2020, 11:22 a.m. | #23
On 01.07.2020 16:40, Cui, Lili wrote:
> Thank you for your careful inspection and suggestions, I update

> the following modifications in the attachment patch.


Thanks.

> 1.  Delete the dashes, for example use "amx_tile" instead of "amx-tile".


Urgh - my comment was meant the other way around. I generally think
dashes ought to be preferred over underscores, with the latter used
only where the former can't be used because of (often) lexical
restrictions. Yet then again I realize all pre-existing ones do use
underscores, so perhaps it's fine this way.

> 2.  Add TMM register name check when AMX is disabled, or outside of 64-bit mode to check_register()

> 3.  Modify AMX name to follow suffixes appear in decode order in disassembler.


I'm afraid I don't understand this one.

> 4. When disable CpuAMX_TILE, other two CpuAMX_* should also be disabled, as amx_tile is the base feature.

> 5.  Add Modrm to Sibmem, #define Sibmem SIB=SIBMEM|Modrm 

> 6.  Change "0xf34B"  to "0xf34b"

> 7.  Change" tilerelease" opcode in to 2 byte and have a value of 0x49c0.

> 8.  Add VEX128 and VexW0 to all AMX instructions.


As a general request - please inline at least the non-generated
parts of patches, as it is quite a bit more cumbersome to comment
on attachments.

On the actual changes: Can you really get away without e.g. also
adjusting match_simd_size()? It would seem to me that one of

	vaddps %tmm1,%tmm2,%tmm3
	tdpbssd %xmm1,%xmm2,%xmm3

may have the function wrongly return true. Overall I think you
need to carefully go over all existing .bitfield.{x,y,z}mmword
and RegSIMD uses to check whether they need adjustment.

I see VEXOP3 is still there. Would you mind explaining why it is
needed? (See my respective, more extensive remark on v1 of the
patch.)

Besides my dislike for i.has_regtmm in general, as expressed
before, I find the way you insert the code to set it somewhat
odd - you put it between xmm and ymm, rather than either ahead
of all pre-existing ones, or after.

I'm also wondering whether the %rip-relative addressing check
couldn't be arranged to live together with pre-existing ones
(some MPX insns have such a restriction, too).

You also add at least two instances of code along the lines of

@@ -8007,7 +8074,9 @@ build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB.  */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);

where comment and code are not fully in sync. If you only want
to check for VSIB, then you mean

		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128
			      && i.tm.opcode_modifier.sib != VECSIB256
			      && i.tm.opcode_modifier.sib != VECSIB512);

But of course it could as well be the comment that gets changed.

Did the testcases change at all? I still don't see any checking
that VEX.W or VEX.L being incorrectly set would result in failed
disassembly. I also think you want to extend at least the
intel-regs testcase to prove that tmm<N> get treated as normal
symbols outside of 64-bit mode. A similar check for 64-bit mode
with AMX disabled would also be nice, but it looks there's no
pre-existing test that you could extend.

As to void_mode you introduce in the disassembler: Is this really
needed? Other insns not wanting any specific size (fldenv is the
first example I can think of) look to simply use 0 in such a case.

Can you please insert new enumerators at their designated places?
For example, the various MOD_VEX_* now sit in the middle of non-
VEX ones, when further down there already is a group of VEX ones.
For maintainability following existing style and arrangements is
really quite important.

Along these lines just look at

+  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0,

Why two different styles even in adjacent lines? Or look at this

@@ -1852,7 +1889,19 @@ enum
   VEX_LEN_0F381A_P_2_M_0,
   VEX_LEN_0F3836_P_2,
   VEX_LEN_0F3841_P_2,
+  LEN_VEX_0F3849_P_0_W_0_M_0,
+  LEN_VEX_0F3849_MOD_3_REG_0_RM_0,
+  LEN_VEX_0F3849_P_2_W_0_M_0,
+  LEN_VEX_0F3849_P_3_W_0_M_0,
+  LEN_VEX_0F384B_P_1_W_0_M_0,
+  LEN_VEX_0F384B_P_2_W_0_M_0,
+  LEN_VEX_0F384B_P_3_W_0_M_0,
   VEX_LEN_0F385A_P_2_M_0,
+  LEN_VEX_0F385C_P_1_W_0_M_0,
+  LEN_VEX_0F385E_P_0_W_0_M_0,
+  LEN_VEX_0F385E_P_1_W_0_M_0,
+  LEN_VEX_0F385E_P_2_W_0_M_0,
+  LEN_VEX_0F385E_P_3_W_0_M_0,
   VEX_LEN_0F38DB_P_2,
   VEX_LEN_0F38F2_P_0,
   VEX_LEN_0F38F3_R_1_P_0,

You insert LEN_VEX_* when everything around is named VEX_LEN_*.
And there's again an outlier style wise (which is also lacking a
_P_<n> infix from the looks of it).

There's also still no support for or checking of uses of
%tmm8...%tmm15. Even worse, there are a number of "reg > 8"
checks, when the respective arrays only have 8 entries.

Thanks, Jan
Alan Modra via Binutils July 2, 2020, 3:58 p.m. | #24
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Thursday, July 2, 2020 7:22 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: H.J. Lu <hjl.tools@gmail.com>; binutils@sourceware.org

> Subject: Re: x86: Add support for Intel AMX instructions

> 

> On 01.07.2020 16:40, Cui, Lili wrote:

> > Thank you for your careful inspection and suggestions, I update the

> > following modifications in the attachment patch.

> 

> Thanks.

> 

> > 1.  Delete the dashes, for example use "amx_tile" instead of "amx-tile".

> 

> Urgh - my comment was meant the other way around. I generally think

> dashes ought to be preferred over underscores, with the latter used only

> where the former can't be used because of (often) lexical restrictions. Yet

> then again I realize all pre-existing ones do use underscores, so perhaps it's

> fine this way.

> 

> > 2.  Add TMM register name check when AMX is disabled, or outside of

> > 64-bit mode to check_register() 3.  Modify AMX name to follow suffixes

> appear in decode order in disassembler.

> 

> I'm afraid I don't understand this one.


2. Also below I'm missing some form of adjustment to check_register() in this patch. 
%tmm<N> should not be recognized as register names (which is even more important
 in "noprefix" mode) when AMX is disabled, or outside of 64-bit mode.

3. This looks misnamed; afaict it should be REG_VEX_0F3849_P_0_W_0_M_3.
Suffixes should appear in decode order, to help the reader both follow the logic
 and look up related table entries.

Sorry , I want to express that I have fixed them, 2 and 3 are your remark on v1 of the patch.

> 

> > 4. When disable CpuAMX_TILE, other two CpuAMX_* should also be

> disabled, as amx_tile is the base feature.

> > 5.  Add Modrm to Sibmem, #define Sibmem SIB=SIBMEM|Modrm 6.

> Change

> > "0xf34B"  to "0xf34b"

> > 7.  Change" tilerelease" opcode in to 2 byte and have a value of 0x49c0.

> > 8.  Add VEX128 and VexW0 to all AMX instructions.

> 

> As a general request - please inline at least the non-generated parts of

> patches, as it is quite a bit more cumbersome to comment on attachments.

> 


I will inline it next time, thanks.

> On the actual changes: Can you really get away without e.g. also adjusting

> match_simd_size()? It would seem to me that one of

> 

> 	vaddps %tmm1,%tmm2,%tmm3

> 	tdpbssd %xmm1,%xmm2,%xmm3

> 

> may have the function wrongly return true. Overall I think you need to

> carefully go over all existing .bitfield.{x,y,z}mmword and RegSIMD uses to

> check whether they need adjustment.


OK, I will check it all over.

> 

> I see VEXOP3 is still there. Would you mind explaining why it is needed? (See

> my respective, more extensive remark on v1 of the

> patch.)


The BEXTR you mentioned is indeed the same encoding method as amx,
We will delete VEXOP3, and find a better way to handle both of them.
 
> 

> Besides my dislike for i.has_regtmm in general, as expressed before, I find

> the way you insert the code to set it somewhat odd - you put it between

> xmm and ymm, rather than either ahead of all pre-existing ones, or after.

> 


I will fix it.

> I'm also wondering whether the %rip-relative addressing check couldn't be

> arranged to live together with pre-existing ones (some MPX insns have such a

> restriction, too).

> 


I will fix it.

> You also add at least two instances of code along the lines of

> 

> @@ -8007,7 +8074,9 @@ build_modrm_byte (void)

>  		{

>  		  i386_operand_type newdisp;

> 

> -		  gas_assert (!i.tm.opcode_modifier.sib);

> +		  /* Only check for VSIB.  */

> +		  gas_assert (!i.tm.opcode_modifier.sib

> +			      || i.tm.opcode_modifier.sib == SIBMEM);

> 

> where comment and code are not fully in sync. If you only want to check for

> VSIB, then you mean

> 

> 		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128

> 			      && i.tm.opcode_modifier.sib != VECSIB256

> 			      && i.tm.opcode_modifier.sib != VECSIB512);

> 

> But of course it could as well be the comment that gets changed.


You are right , I need to change the code.

> 

> Did the testcases change at all? I still don't see any checking that VEX.W or

> VEX.L being incorrectly set would result in failed disassembly. I also think you

> want to extend at least the intel-regs testcase to prove that tmm<N> get

> treated as normal symbols outside of 64-bit mode. A similar check for 64-bit

> mode with AMX disabled would also be nice, but it looks there's no pre-

> existing test that you could extend.


I will add them.

> As to void_mode you introduce in the disassembler: Is this really needed?

> Other insns not wanting any specific size (fldenv is the first example I can

> think of) look to simply use 0 in such a case.

> 


I will fix it.

> Can you please insert new enumerators at their designated places?

> For example, the various MOD_VEX_* now sit in the middle of non- VEX ones,

> when further down there already is a group of VEX ones.

> For maintainability following existing style and arrangements is really quite

> important.

> 

> Along these lines just look at

> 

> +  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,

> +  X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0,

> 

> Why two different styles even in adjacent lines? Or look at this

> 


Because the abbreviation of  "REG_0_RM_0" is " R_0_R_0" , so I used full name instead.

> @@ -1852,7 +1889,19 @@ enum

>    VEX_LEN_0F381A_P_2_M_0,

>    VEX_LEN_0F3836_P_2,

>    VEX_LEN_0F3841_P_2,

> +  LEN_VEX_0F3849_P_0_W_0_M_0,

> +  LEN_VEX_0F3849_MOD_3_REG_0_RM_0,

> +  LEN_VEX_0F3849_P_2_W_0_M_0,

> +  LEN_VEX_0F3849_P_3_W_0_M_0,

> +  LEN_VEX_0F384B_P_1_W_0_M_0,

> +  LEN_VEX_0F384B_P_2_W_0_M_0,

> +  LEN_VEX_0F384B_P_3_W_0_M_0,

>    VEX_LEN_0F385A_P_2_M_0,

> +  LEN_VEX_0F385C_P_1_W_0_M_0,

> +  LEN_VEX_0F385E_P_0_W_0_M_0,

> +  LEN_VEX_0F385E_P_1_W_0_M_0,

> +  LEN_VEX_0F385E_P_2_W_0_M_0,

> +  LEN_VEX_0F385E_P_3_W_0_M_0,

>    VEX_LEN_0F38DB_P_2,

>    VEX_LEN_0F38F2_P_0,

>    VEX_LEN_0F38F3_R_1_P_0,

> 

> You insert LEN_VEX_* when everything around is named VEX_LEN_*.

> And there's again an outlier style wise (which is also lacking a _P_<n> infix

> from the looks of it).


I will fix it.

> 

> There's also still no support for or checking of uses of %tmm8...%tmm15.

> Even worse, there are a number of "reg > 8"

> checks, when the respective arrays only have 8 entries.


Sorry, it should be "reg > 0x07", I will fix it.
From spec we only define TMM0..TMM7, so any tmm register number exceeding 7 is illegal.

Thanks, 
Lili.


> Thanks, Jan
Jan Beulich July 2, 2020, 4:31 p.m. | #25
On 02.07.2020 17:58, Cui, Lili wrote:
>> From: Jan Beulich <jbeulich@suse.com>

>> Sent: Thursday, July 2, 2020 7:22 PM

>>

>> For maintainability following existing style and arrangements is really quite

>> important.

>>

>> Along these lines just look at

>>

>> +  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,

>> +  X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0,

>>

>> Why two different styles even in adjacent lines? Or look at this

>>

> 

> Because the abbreviation of  "REG_0_RM_0" is " R_0_R_0" , so I used full name instead.


Oh, I see. At least MOD and LEN should still be abbreviated then,
I think.

>> @@ -1852,7 +1889,19 @@ enum

>>    VEX_LEN_0F381A_P_2_M_0,

>>    VEX_LEN_0F3836_P_2,

>>    VEX_LEN_0F3841_P_2,

>> +  LEN_VEX_0F3849_P_0_W_0_M_0,

>> +  LEN_VEX_0F3849_MOD_3_REG_0_RM_0,

>> +  LEN_VEX_0F3849_P_2_W_0_M_0,

>> +  LEN_VEX_0F3849_P_3_W_0_M_0,

>> +  LEN_VEX_0F384B_P_1_W_0_M_0,

>> +  LEN_VEX_0F384B_P_2_W_0_M_0,

>> +  LEN_VEX_0F384B_P_3_W_0_M_0,

>>    VEX_LEN_0F385A_P_2_M_0,

>> +  LEN_VEX_0F385C_P_1_W_0_M_0,

>> +  LEN_VEX_0F385E_P_0_W_0_M_0,

>> +  LEN_VEX_0F385E_P_1_W_0_M_0,

>> +  LEN_VEX_0F385E_P_2_W_0_M_0,

>> +  LEN_VEX_0F385E_P_3_W_0_M_0,

>>    VEX_LEN_0F38DB_P_2,

>>    VEX_LEN_0F38F2_P_0,

>>    VEX_LEN_0F38F3_R_1_P_0,

>>

>> You insert LEN_VEX_* when everything around is named VEX_LEN_*.

>> And there's again an outlier style wise (which is also lacking a _P_<n> infix

>> from the looks of it).

> 

> I will fix it.

> 

>>

>> There's also still no support for or checking of uses of %tmm8...%tmm15.

>> Even worse, there are a number of "reg > 8"

>> checks, when the respective arrays only have 8 entries.

> 

> Sorry, it should be "reg > 0x07", I will fix it.

> From spec we only define TMM0..TMM7, so any tmm register number exceeding

> 7 is illegal.


There's nowhere the spec says so, afaics. What is or is not legal
is solely controlled by the chosen palette (which is not something
you can know at assembly time). We've already settled with H.J. on
providing all 16 registers.

Jan
Alan Modra via Binutils July 2, 2020, 4:50 p.m. | #26
On Thu, Jul 2, 2020 at 9:31 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 02.07.2020 17:58, Cui, Lili wrote:

> >> From: Jan Beulich <jbeulich@suse.com>

> >> Sent: Thursday, July 2, 2020 7:22 PM

> >>

> >> For maintainability following existing style and arrangements is really quite

> >> important.

> >>

> >> Along these lines just look at

> >>

> >> +  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,

> >> +  X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0,

> >>

> >> Why two different styles even in adjacent lines? Or look at this

> >>

> >

> > Because the abbreviation of  "REG_0_RM_0" is " R_0_R_0" , so I used full name instead.

>

> Oh, I see. At least MOD and LEN should still be abbreviated then,

> I think.

>

> >> @@ -1852,7 +1889,19 @@ enum

> >>    VEX_LEN_0F381A_P_2_M_0,

> >>    VEX_LEN_0F3836_P_2,

> >>    VEX_LEN_0F3841_P_2,

> >> +  LEN_VEX_0F3849_P_0_W_0_M_0,

> >> +  LEN_VEX_0F3849_MOD_3_REG_0_RM_0,

> >> +  LEN_VEX_0F3849_P_2_W_0_M_0,

> >> +  LEN_VEX_0F3849_P_3_W_0_M_0,

> >> +  LEN_VEX_0F384B_P_1_W_0_M_0,

> >> +  LEN_VEX_0F384B_P_2_W_0_M_0,

> >> +  LEN_VEX_0F384B_P_3_W_0_M_0,

> >>    VEX_LEN_0F385A_P_2_M_0,

> >> +  LEN_VEX_0F385C_P_1_W_0_M_0,

> >> +  LEN_VEX_0F385E_P_0_W_0_M_0,

> >> +  LEN_VEX_0F385E_P_1_W_0_M_0,

> >> +  LEN_VEX_0F385E_P_2_W_0_M_0,

> >> +  LEN_VEX_0F385E_P_3_W_0_M_0,

> >>    VEX_LEN_0F38DB_P_2,

> >>    VEX_LEN_0F38F2_P_0,

> >>    VEX_LEN_0F38F3_R_1_P_0,

> >>

> >> You insert LEN_VEX_* when everything around is named VEX_LEN_*.

> >> And there's again an outlier style wise (which is also lacking a _P_<n> infix

> >> from the looks of it).

> >

> > I will fix it.

> >

> >>

> >> There's also still no support for or checking of uses of %tmm8...%tmm15.

> >> Even worse, there are a number of "reg > 8"

> >> checks, when the respective arrays only have 8 entries.

> >

> > Sorry, it should be "reg > 0x07", I will fix it.

> > From spec we only define TMM0..TMM7, so any tmm register number exceeding

> > 7 is illegal.

>

> There's nowhere the spec says so, afaics. What is or is not legal

> is solely controlled by the chosen palette (which is not something

> you can know at assembly time). We've already settled with H.J. on

> providing all 16 registers.

>


After internal discussion, we decide to treat TMM registers like
mask registers.  Encoding supports 16 registers, but only first 8 are
valid.

-- 
H.J.
Jan Beulich July 3, 2020, 9:32 a.m. | #27
On 02.07.2020 18:50, H.J. Lu wrote:
> On Thu, Jul 2, 2020 at 9:31 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 02.07.2020 17:58, Cui, Lili wrote:

>>>> From: Jan Beulich <jbeulich@suse.com>

>>>> Sent: Thursday, July 2, 2020 7:22 PM

>>>>

>>>> For maintainability following existing style and arrangements is really quite

>>>> important.

>>>>

>>>> Along these lines just look at

>>>>

>>>> +  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,

>>>> +  X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0,

>>>>

>>>> Why two different styles even in adjacent lines? Or look at this

>>>>

>>>

>>> Because the abbreviation of  "REG_0_RM_0" is " R_0_R_0" , so I used full name instead.

>>

>> Oh, I see. At least MOD and LEN should still be abbreviated then,

>> I think.

>>

>>>> @@ -1852,7 +1889,19 @@ enum

>>>>    VEX_LEN_0F381A_P_2_M_0,

>>>>    VEX_LEN_0F3836_P_2,

>>>>    VEX_LEN_0F3841_P_2,

>>>> +  LEN_VEX_0F3849_P_0_W_0_M_0,

>>>> +  LEN_VEX_0F3849_MOD_3_REG_0_RM_0,

>>>> +  LEN_VEX_0F3849_P_2_W_0_M_0,

>>>> +  LEN_VEX_0F3849_P_3_W_0_M_0,

>>>> +  LEN_VEX_0F384B_P_1_W_0_M_0,

>>>> +  LEN_VEX_0F384B_P_2_W_0_M_0,

>>>> +  LEN_VEX_0F384B_P_3_W_0_M_0,

>>>>    VEX_LEN_0F385A_P_2_M_0,

>>>> +  LEN_VEX_0F385C_P_1_W_0_M_0,

>>>> +  LEN_VEX_0F385E_P_0_W_0_M_0,

>>>> +  LEN_VEX_0F385E_P_1_W_0_M_0,

>>>> +  LEN_VEX_0F385E_P_2_W_0_M_0,

>>>> +  LEN_VEX_0F385E_P_3_W_0_M_0,

>>>>    VEX_LEN_0F38DB_P_2,

>>>>    VEX_LEN_0F38F2_P_0,

>>>>    VEX_LEN_0F38F3_R_1_P_0,

>>>>

>>>> You insert LEN_VEX_* when everything around is named VEX_LEN_*.

>>>> And there's again an outlier style wise (which is also lacking a _P_<n> infix

>>>> from the looks of it).

>>>

>>> I will fix it.

>>>

>>>>

>>>> There's also still no support for or checking of uses of %tmm8...%tmm15.

>>>> Even worse, there are a number of "reg > 8"

>>>> checks, when the respective arrays only have 8 entries.

>>>

>>> Sorry, it should be "reg > 0x07", I will fix it.

>>> From spec we only define TMM0..TMM7, so any tmm register number exceeding

>>> 7 is illegal.

>>

>> There's nowhere the spec says so, afaics. What is or is not legal

>> is solely controlled by the chosen palette (which is not something

>> you can know at assembly time). We've already settled with H.J. on

>> providing all 16 registers.

> 

> After internal discussion, we decide to treat TMM registers like

> mask registers.  Encoding supports 16 registers, but only first 8 are

> valid.


Hmm, I disagree (without the doc getting changed to express this, ideally
alongside a "why"), but I guess I get no saying here. However, if so,
then this absolutely needs to be accompanied by a specification of what
VEX.R. VEX.B, and the high bit of VEX.VVVV mean then: Are they to be
ignored, or would them being clear be a reason for #UD? There are similar
rules for the mask registers, after all (#UD for the first and last of
the cases, while VEX.B is ignored, from all I can tell).

Also what about a hypothetical palette then for which e.g. registers
above %tmm3 "are not valid tiles" (using the ISA Extensions doc wording
intentionally here)?

Jan
Alan Modra via Binutils July 6, 2020, 3:36 a.m. | #28
Hi Jan,

I update the following modifications for AMX, could you help me take a look? Thanks.

1. Add tmmword check in match_simd_size ().
2. Both BEXTR and AMX are encoded with "SwapSources", remove "VEXOP3".
3. Put zmmword check ahead of all pre-existing ones. Before it was placed between xmmword and ymmword.
4. Put disallow RegIP for non-vector SIB together with  the exiting MPX check.
5. There are two place comment and code are not in sync, adjusted them according to the actual situation.
6. Add VEX.W and VEX.L incorrect checking for AMX instruction, and add tmm check in intel-regs.
7. Use "generic_mode = 0 "  instead  of void_mode.
8. Change enumerators name from " LEN_VEX_* " to " VEX_LEN_*", 
     and similar modifications to X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0.

Subject: [PATCH] x86: Add support for Intel AMX instructions

gas/
	* doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.
	* config/tc-i386.c (i386_error): Add invalid_sib_address.
	(cpu_arch): Add .amx_int8, .amx_bf16 and .amx_tile.
	(cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.
	(match_simd_size): Add tmmword check.
	(operand_type_match): Add tmmword.
	(type_names): Add rTMM.
	(check_VecOperands): Handle invalid_sib_address.
	(match_template): Handle invalid_sib_address.
	(build_modrm_byte): Handle non-vector SIB and zmmword.
	(i386_index_check): Disallow RegIP for non-vector SIB.
	(check_register): Handle zmmword.
	* testsuite/gas/i386/i386.exp: Add AMX new tests.
	* testsuite/gas/i386/intel-regs.d: Add tmm check.
	* testsuite/gas/i386/intel-regs.s: Add tmm check.
	* testsuite/gas/i386/x86-64-amx-intel.d: New.
	* testsuite/gas/i386/x86-64-amx-inval.l: New.
	* testsuite/gas/i386/x86-64-amx-inval.s: New.
	* testsuite/gas/i386/x86-64-amx.d: New.
	* testsuite/gas/i386/x86-64-amx.s: New.

opcodes/
	* i386-dis.c (EV): New for generic memory operand.
	(XMT): New.
	(EXtmm): Likewise.
	(Vextmm): Likewise.
	(generic_mode): Likewise.
	(tmm_mode): Likewise.
	(REG_VEX_0F3849_P_0_W_0_M_3): Likewise.
	(MOD_VEX_0F3849_P_0_W_0): Likewise.
	(MOD_VEX_0F3849_P_2_W_0): Likewise.
	(MOD_VEX_0F3849_P_3_W_0): Likewise.
	(MOD_VEX_0F384B_P_1_W_0): Likewise.
	(MOD_VEX_0F384B_P_2_W_0): Likewise.
	(MOD_VEX_0F384B_P_3_W_0): Likewise.
	(MOD_VEX_0F385C_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_0_W_0): Likewise.
	(MOD_VEX_0F385E_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_2_W_0): Likewise.
	(MOD_VEX_0F385E_P_3_W_0): Likewise.
	(RM_VEX_0F3849_P_0_W_0_M_3_R_0): Likewise.
	(PREFIX_VEX_0F3849): Likewise.
	(PREFIX_VEX_0F384B): Likewise.
	(PREFIX_VEX_0F385C): Likewise.
	(PREFIX_VEX_0F385E): Likewise.
	(X86_64_0F01_REG_3): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_MOD_3_REG_0_RM_0_LEN_0): Likewise.
	(X86_64_VEX_0F3849_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385C_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_3_W_0_M_0_L_0): Likewise.
	(VEX_W_0F3849_P_0): Likewise.
	(VEX_W_0F3849_P_2): Likewise.
	(VEX_W_0F3849_P_3): Likewise.
	(VEX_W_0F384B_P_1): Likewise.
	(VEX_W_0F384B_P_2): Likewise.
	(VEX_W_0F384B_P_3): Likewise.
	(VEX_W_0F385C_P_1): Likewise.
	(VEX_W_0F385E_P_0): Likewise.
	(VEX_W_0F385E_P_1): Likewise.
	(VEX_W_0F385E_P_2): Likewise.
	(VEX_W_0F385E_P_3): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_3_REG_0_RM_0): Likewise.
	(VEX_LEN_0F3849_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F385C_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_3_W_0_M_0): Likewise.
	(names_tmm): Likewise.
	(att_names_tmm): Likewise.
	(intel_operand_size): Handle void_mode.
	(OP_XMM): Handle tmm_mode.
	(OP_EX): Likewise.
	(OP_VEX): Likewise.
	* i386-gen.c (cpu_flag_init): Add entries for
	CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(operand_type_shorthands): Add RegTMM.
	(operand_type_init): Likewise.
	(operand_types): Add Tmmword.
	(cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	* i386-opc.h (CpuAMX_INT8): New.
	(CpuAMX_BF16): Likewise.
	(CpuAMX_TILE): Likewise.
	(SIBMEM): Likewise.
	(Tmmword): Likewise.
	(i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.
	(i386_opcode_modifier): Extend width of fields vexvvvv and sib.
	(i386_operand_type): Add tmmword.
	* i386-opc.tbl: Add AMX instructions.
	* i386-reg.tbl: Add AMX registers.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                      |  75 ++++-
 gas/doc/c-i386.texi                       |   7 +
 gas/testsuite/gas/i386/i386.exp           |   3 +
 gas/testsuite/gas/i386/intel-regs.d       |   4 +
 gas/testsuite/gas/i386/intel-regs.s       |   4 +
 gas/testsuite/gas/i386/x86-64-amx-intel.d |  76 +++++
 gas/testsuite/gas/i386/x86-64-amx-inval.l |   9 +
 gas/testsuite/gas/i386/x86-64-amx-inval.s |  14 +
 gas/testsuite/gas/i386/x86-64-amx.d       |  76 +++++
 gas/testsuite/gas/i386/x86-64-amx.s       |  68 ++++
 opcodes/i386-dis.c                        | 384 +++++++++++++++++++++-
 opcodes/i386-gen.c                        |  18 +
 opcodes/i386-opc.h                        |  16 +-
 opcodes/i386-opc.tbl                      |  23 ++
 opcodes/i386-reg.tbl                      |   9 +
 15 files changed, 767 insertions(+), 19 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 2e0eb24753..0d5afb5b32 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -290,6 +290,7 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
     unsupported_vector_index_register,
@@ -372,6 +373,9 @@ struct _i386_insn
     /* Has ZMM register operands.  */
     bfd_boolean has_regzmm;
 
+    /* Has TMM register operands.  */
+    bfd_boolean has_regtmm;
+
     /* Has GOTPC or TLS relocation.  */
     bfd_boolean has_gotpc_tls_reloc;
 
@@ -1201,6 +1205,12 @@ static const arch_entry cpu_arch[] =
     CPU_WAITPKG_FLAGS, 0 },
   { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,
     CPU_CLDEMOTE_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_int8"), PROCESSOR_UNKNOWN,
+    CPU_AMX_INT8_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_bf16"), PROCESSOR_UNKNOWN,
+    CPU_AMX_BF16_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_tile"), PROCESSOR_UNKNOWN,
+    CPU_AMX_TILE_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdiri"), PROCESSOR_UNKNOWN,
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
@@ -1259,6 +1269,9 @@ static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },
   { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
+  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },
+  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },
+  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
   { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
@@ -2159,7 +2172,9 @@ match_simd_size (const insn_template *t, unsigned int wanted,
 	   || (i.types[given].bitfield.ymmword
 	       && !t->operand_types[wanted].bitfield.ymmword)
 	   || (i.types[given].bitfield.zmmword
-	       && !t->operand_types[wanted].bitfield.zmmword));
+	       && !t->operand_types[wanted].bitfield.zmmword)
+	   || (i.types[given].bitfield.tmmword
+	       && !t->operand_types[wanted].bitfield.tmmword));
 }
 
 /* Return 1 if there is no conflict in any size between operand GIVEN
@@ -2296,6 +2311,7 @@ operand_type_match (i386_operand_type overlap,
   temp.bitfield.xmmword = 0;
   temp.bitfield.ymmword = 0;
   temp.bitfield.zmmword = 0;
+  temp.bitfield.tmmword = 0;
   if (operand_type_all_zero (&temp))
     goto mismatch;
 
@@ -3304,6 +3320,7 @@ const type_names[] =
   { OPERAND_TYPE_REGXMM, "rXMM" },
   { OPERAND_TYPE_REGYMM, "rYMM" },
   { OPERAND_TYPE_REGZMM, "rZMM" },
+  { OPERAND_TYPE_REGTMM, "rTMM" },
   { OPERAND_TYPE_REGMASK, "Mask reg" },
 };
 
@@ -5790,7 +5807,7 @@ check_VecOperands (const insn_template *t)
 
   /* For VSIB byte, we need a vector register for index, and all vector
      registers must be distinct.  */
-  if (t->opcode_modifier.sib)
+  if (t->opcode_modifier.sib && t->opcode_modifier.sib != SIBMEM)
     {
       if (!i.index_reg
 	  || !((t->opcode_modifier.sib == VECSIB128
@@ -6584,6 +6601,9 @@ match_template (char mnem_suffix)
 	  as_bad (_("unsupported instruction `%s'"),
 		  current_templates->start->name);
 	  return NULL;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
 	case invalid_vsib_address:
 	  err_msg = _("invalid VSIB address");
 	  break;
@@ -7923,8 +7943,11 @@ build_modrm_byte (void)
 	  else if (i.op[dest].regs->reg_type.bitfield.class == RegSIMD
 		   || i.op[source].regs->reg_type.bitfield.class == RegSIMD)
 	    {
-	      if (i.types[dest].bitfield.zmmword
-		  || i.types[source].bitfield.zmmword)
+	      if (i.types[dest].bitfield.tmmword
+		  || i.types[source].bitfield.tmmword)
+		i.has_regtmm = TRUE;
+	      else if (i.types[dest].bitfield.zmmword
+		       || i.types[source].bitfield.zmmword)
 		i.has_regzmm = TRUE;
 	      else if (i.types[dest].bitfield.ymmword
 		       || i.types[source].bitfield.ymmword)
@@ -7966,7 +7989,9 @@ build_modrm_byte (void)
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
-	      if (i.index_reg->reg_num == RegIZ)
+	      /* The index register of VSIB shouldn't be RegIZ.  */
+	      if (i.tm.opcode_modifier.sib != SIBMEM
+		  && i.index_reg->reg_num == RegIZ)
 		abort ();
 
 	      i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
@@ -7989,8 +8014,19 @@ build_modrm_byte (void)
 		      i.types[op].bitfield.disp32s = 1;
 		    }
 		}
-	      i.sib.index = i.index_reg->reg_num;
-	      set_rex_vrex (i.index_reg, REX_X, FALSE);
+
+	      /* Since the mandatory SIB always has index register, so
+		 the code logic remains unchanged. The non-mandatory SIB
+		 without index register is allowed and will be handled
+		 later.  */
+	      if (i.index_reg)
+		{
+		  if (i.index_reg->reg_num == RegIZ)
+		    i.sib.index = NO_INDEX_REGISTER;
+		  else
+		    i.sib.index = i.index_reg->reg_num;
+		  set_rex_vrex (i.index_reg, REX_X, FALSE);
+		}
 	    }
 
 	  default_seg = &ds;
@@ -8004,7 +8040,9 @@ build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Both check for VSIB and mandatory non-vector SIB. */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
 		  /* Operand is just <disp>  */
 		  if (flag_code == CODE_64BIT)
 		    {
@@ -8142,7 +8180,11 @@ build_modrm_byte (void)
 	      i.sib.scale = i.log2_scale_factor;
 	      if (i.index_reg == 0)
 		{
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB. */
+		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128
+			      && i.tm.opcode_modifier.sib != VECSIB256
+			      && i.tm.opcode_modifier.sib != VECSIB512);
+
 		  /* <disp>(%esp) becomes two byte modrm with no index
 		     register.  We've already stored the code for esp
 		     in i.rm.regmem ie. ESCAPE_TO_TWO_BYTE_ADDRESSING.
@@ -8267,7 +8309,9 @@ build_modrm_byte (void)
 		break;
 	      if (i.types[op].bitfield.class == RegSIMD)
 		{
-		  if (i.types[op].bitfield.zmmword)
+		  if (i.types[op].bitfield.tmmword)
+		    i.has_regtmm = TRUE;
+		  else if (i.types[op].bitfield.zmmword)
 		    i.has_regzmm = TRUE;
 		  else if (i.types[op].bitfield.ymmword)
 		    i.has_regymm = TRUE;
@@ -10926,9 +10970,10 @@ i386_index_check (const char *operand_string)
 		      || !i.index_reg->reg_type.bitfield.baseindex)))
 	    goto bad_address;
 
-	  /* bndmk, bndldx, and bndstx have special restrictions. */
+	  /* bndmk, bndldx, bndstx and mandatory non-vector SIB have special restrictions. */
 	  if (current_templates->start->base_opcode == 0xf30f1b
-	      || (current_templates->start->base_opcode & ~1) == 0x0f1a)
+	      || (current_templates->start->base_opcode & ~1) == 0x0f1a
+	      || current_templates->start->opcode_modifier.sib == SIBMEM)
 	    {
 	      /* They cannot use RIP-relative addressing. */
 	      if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -10939,6 +10984,7 @@ i386_index_check (const char *operand_string)
 
 	      /* bndldx and bndstx ignore their scale factor. */
 	      if (current_templates->start->base_opcode != 0xf30f1b
+		  && current_templates->start->opcode_modifier.sib != SIBMEM
 		  && i.log2_scale_factor)
 		as_warn (_("register scaling is being ignored here"));
 	    }
@@ -12440,6 +12486,11 @@ static bfd_boolean check_register (const reg_entry *r)
 	}
     }
 
+  if (r->reg_type.bitfield.tmmword
+      && (!cpu_arch_flags.bitfield.cpuamx_tile
+          || flag_code != CODE_64BIT))
+    return FALSE;
+
   if (r->reg_type.bitfield.class == RegBND && !cpu_arch_flags.bitfield.cpumpx)
     return FALSE;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index d4e6fcb698..cb86cc7968 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
+@code{amx_int8},
+@code{noamx_int8},
+@code{amx_bf16},
+@code{noamx_bf16},
+@code{amx_tile},
+@code{noamx_tile},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1504,6 +1510,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
+@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 55929d3acb..d3c15769fb 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -1137,6 +1137,9 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"
+    run_list_test "x86-64-amx-inval"
+    run_dump_test "x86-64-amx"
+    run_dump_test "x86-64-amx-intel"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/intel-regs.d b/gas/testsuite/gas/i386/intel-regs.d
index 65bcb6ca7d..480b291c91 100644
--- a/gas/testsuite/gas/i386/intel-regs.d
+++ b/gas/testsuite/gas/i386/intel-regs.d
@@ -6,6 +6,7 @@
 
 Disassembly of section \.text:
 0+0 <.*>:
+.*[ 	]+R_386_32[ 	]+tmm1
 .*[ 	]+R_386_16[ 	]+eax
 .*[ 	]+R_386_16[ 	]+rax
 .*[ 	]+R_386_16[ 	]+axl
@@ -53,4 +54,7 @@ Disassembly of section \.text:
 
 .* <ymm8>:
 .*[ 	]+<ymm8>
+
+.* <tmm0>:
+.*[ 	]+<tmm0>
 #pass
diff --git a/gas/testsuite/gas/i386/intel-regs.s b/gas/testsuite/gas/i386/intel-regs.s
index 66ab16dfc5..44e369bb0f 100644
--- a/gas/testsuite/gas/i386/intel-regs.s
+++ b/gas/testsuite/gas/i386/intel-regs.s
@@ -1,6 +1,8 @@
 	.text
 	.intel_syntax noprefix
 
+	mov	eax, tmm1
+
 	.arch i286
 	.code16
 	mov	ax, eax			; add	[bx+si], al
@@ -59,3 +61,5 @@
 	mov	rax, r8
 ymm8:
 	jmp	ymm8
+tmm0:
+	jmp	tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx-intel.d b/gas/testsuite/gas/i386/x86-64-amx-intel.d
new file mode 100644
index 0000000000..bb9b442a98
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d
@@ -0,0 +1,76 @@
+#as:
+#objdump: -d -Mintel
+#name: x86_64 AMX insns in Intel syntax
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+
+[a-f0-9]+ <bad>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 f8 5e[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*d1 90 90 90 90 90[ 	]*rcl.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7c 5e[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*d1.*
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.l b/gas/testsuite/gas/i386/x86-64-amx-inval.l
new file mode 100644
index 0000000000..e383a455b4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.l
@@ -0,0 +1,9 @@
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: operand size mismatch for `tdpbssd'
+.*:11: Error: `\[rip\]' cannot be used here
+.*:12: Error: `\[rip\]' cannot be used here
+.*:13: Error: `\[rip\]' cannot be used here
+.*:14: Error: operand size mismatch for `tdpbssd'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.s b/gas/testsuite/gas/i386/x86-64-amx-inval.s
new file mode 100644
index 0000000000..e2d61d32ae
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.s
@@ -0,0 +1,14 @@
+# Check for SIBMEM operand used in certain AMX instructions
+
+    .text
+_start:
+    tileloadd (%rip), %tmm1
+    tileloaddt1 (%rip), %tmm1
+    tilestored %tmm1, (%rip)
+    tdpbssd %xmm1, %xmm2, %xmm3
+
+    .intel_syntax noprefix
+    tileloadd tmm1, [rip]
+    tileloaddt1 tmm1, [rip]
+    tilestored [rip], tmm1
+    tdpbssd xmm3, xmm2, xmm1
diff --git a/gas/testsuite/gas/i386/x86-64-amx.d b/gas/testsuite/gas/i386/x86-64-amx.d
new file mode 100644
index 0000000000..628d3475b7
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.d
@@ -0,0 +1,76 @@
+#as:
+#objdump: -d
+#name: x86_64 AMX insns
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+
+[a-f0-9]+ <bad>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 f8 5e[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*d1 90 90 90 90 90[ 	]*rcll.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7c 5e[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*d1.*
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx.s b/gas/testsuite/gas/i386/x86-64-amx.s
new file mode 100644
index 0000000000..17aa9e955c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.s
@@ -0,0 +1,68 @@
+
+  .allow_index_reg
+  .text
+_start:
+  ldtilecfg  (%rcx,%rdx,2)
+  sttilecfg  (%rcx,%rdx,2)
+  tdpbf16ps %tmm5, %tmm4, %tmm3
+  tdpbssd %tmm3, %tmm2, %tmm1
+  tdpbsud %tmm3, %tmm2, %tmm1
+  tdpbusd %tmm3, %tmm2, %tmm1
+  tdpbuud %tmm3, %tmm2, %tmm1
+  tileloadd foo, %tmm5
+  tileloadd (%rcx), %tmm5
+  tileloadd (%ecx), %tmm5
+  tileloadd (%rcx,%rdx,1), %tmm5
+  tileloadd (%ecx,%edx,2), %tmm1
+  tileloaddt1 foo, %tmm5
+  tileloaddt1 (%rcx), %tmm5
+  tileloaddt1 (%ecx), %tmm5
+  tileloaddt1 (%rcx,%rdx,1), %tmm5
+  tileloaddt1 (%ecx,%edx,2), %tmm1
+  tileloaddt1 (%rcx,%riz,2), %tmm1
+  tilerelease
+  tilestored %tmm5, (%rcx)
+  tilestored %tmm5, (%ecx)
+  tilestored %tmm5, (%rcx,%rdx,1)
+  tilestored %tmm1, (%ecx,%edx,2)
+  tilezero %tmm0
+  tilezero %tmm5
+  tilezero %tmm7
+
+
+  .intel_syntax noprefix
+  ldtilecfg  [rcx]
+  ldtilecfg  [rbx]
+  sttilecfg  [rcx]
+  sttilecfg  [rbx]
+  tdpbf16ps tmm3, tmm4, tmm5
+  tdpbssd tmm1, tmm2, tmm3
+  tdpbsud tmm1, tmm2, tmm3
+  tdpbusd tmm1, tmm2, tmm3
+  tdpbuud tmm1, tmm2, tmm3
+  tileloadd tmm5, foo
+  tileloadd tmm5, [rcx]
+  tileloadd tmm5, [ecx]
+  tileloadd tmm5, [rcx+rdx]
+  tileloadd tmm1, [ecx+edx*2]
+  tileloaddt1 tmm5, foo
+  tileloaddt1 tmm5, [rcx]
+  tileloaddt1 tmm5, [ecx]
+  tileloaddt1 tmm5, [rcx+rdx]
+  tileloaddt1 tmm1, [ecx+edx*2]
+  tileloaddt1 tmm1, [rcx+riz*2]
+  tilerelease
+  tilestored [rcx], tmm5
+  tilestored [ecx], tmm5
+  tilestored [rcx+rdx], tmm5
+  tilestored [ecx+edx*2], tmm1
+  tilezero tmm0
+  tilezero tmm5
+  tilezero tmm7
+
+bad:
+    #tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.W to an illegal value.
+    .byte 0xc4, 0xe2, 0xf8, 0x5e, 0xd1
+    .fill 0x5, 0x1, 0x90
+    #tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.L to an illegal value.
+    .byte 0xc4, 0xe2, 0x7c, 0x5e, 0xd1
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e1ebb48553..d6ab023b23 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -244,6 +244,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Bad_Opcode NULL, { { NULL, 0 } }, 0
 
 #define Eb { OP_E, b_mode }
+#define EV { OP_E, generic_mode }
 #define Ebnd { OP_E, bnd_mode }
 #define EbS { OP_E, b_swap_mode }
 #define EbndS { OP_E, bnd_swap_mode }
@@ -374,6 +375,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define XMScalar { OP_XMM, scalar_mode }
 #define XMGatherQ { OP_XMM, vex_vsib_q_w_dq_mode }
 #define XMM { OP_XMM, xmm_mode }
+#define XMT { OP_XMM, tmm_mode }
 #define XMxmmq { OP_XMM, xmmq_mode }
 #define EM { OP_EM, v_mode }
 #define EMS { OP_EM, v_swap_mode }
@@ -393,6 +395,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
+#define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
@@ -423,6 +426,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Vex128 { OP_VEX, vex128_mode }
 #define Vex256 { OP_VEX, vex256_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define Vextmm { OP_VEX, tmm_mode }
 #define EXdVexScalarS { OP_EX_Vex, d_scalar_swap_mode }
 #define EXqVexScalarS { OP_EX_Vex, q_scalar_swap_mode }
 #define EXVexW { OP_EX_VexW, x_mode }
@@ -484,8 +488,10 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 
 enum
 {
+  /* A generic memory operand. */
+  generic_mode = 0,
   /* byte operand */
-  b_mode = 1,
+  b_mode,
   /* byte operand with operand swapped */
   b_swap_mode,
   /* byte operand, sign extend like 'T' suffix */
@@ -544,6 +550,8 @@ enum
   ymmq_mode,
   /* 32-byte YMM or 16-byte word operand */
   ymmxmm_mode,
+  /* TMM operand */
+  tmm_mode,
   /* d_mode in 32bit, q_mode in 64bit mode.  */
   m_mode,
   /* pair of v_mode operands */
@@ -749,6 +757,7 @@ enum
   REG_VEX_0F72,
   REG_VEX_0F73,
   REG_VEX_0FAE,
+  REG_VEX_0F3849_P_0_W_0_M_3,
   REG_VEX_0F38F3,
   REG_XOP_LWPCB,
   REG_XOP_LWP,
@@ -832,6 +841,17 @@ enum
   MOD_0FE7_PREFIX_2,
   MOD_0FF0_PREFIX_3,
   MOD_0F382A_PREFIX_2,
+  MOD_VEX_0F3849_P_0_W_0,
+  MOD_VEX_0F3849_P_2_W_0,
+  MOD_VEX_0F3849_P_3_W_0,
+  MOD_VEX_0F384B_P_1_W_0,
+  MOD_VEX_0F384B_P_2_W_0,
+  MOD_VEX_0F384B_P_3_W_0,
+  MOD_VEX_0F385C_P_1_W_0,
+  MOD_VEX_0F385E_P_0_W_0,
+  MOD_VEX_0F385E_P_1_W_0,
+  MOD_VEX_0F385E_P_2_W_0,
+  MOD_VEX_0F385E_P_3_W_0,
   MOD_0F38F5_PREFIX_2,
   MOD_0F38F6_PREFIX_0,
   MOD_0F38F8_PREFIX_1,
@@ -961,6 +981,7 @@ enum
   RM_0F1E_P_1_MOD_3_REG_7,
   RM_0FAE_REG_6_MOD_3_P_0,
   RM_0FAE_REG_7_MOD_3,
+  RM_VEX_0F3849_P_0_W_0_M_3_R_0
 };
 
 enum
@@ -1296,9 +1317,13 @@ enum
   PREFIX_VEX_0F3845,
   PREFIX_VEX_0F3846,
   PREFIX_VEX_0F3847,
+  PREFIX_VEX_0F3849,
+  PREFIX_VEX_0F384B,
   PREFIX_VEX_0F3858,
   PREFIX_VEX_0F3859,
   PREFIX_VEX_0F385A,
+  PREFIX_VEX_0F385C,
+  PREFIX_VEX_0F385E,
   PREFIX_VEX_0F3878,
   PREFIX_VEX_0F3879,
   PREFIX_VEX_0F388C,
@@ -1767,7 +1792,19 @@ enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_2,
-  X86_64_0F01_REG_3
+  X86_64_0F01_REG_3,
+  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_0_W_0_M_3_REG_0_RM_0_L_0,
+  X86_64_VEX_0F3849_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F385C_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_3_W_0_M_0_L_0
 };
 
 enum
@@ -1852,7 +1889,19 @@ enum
   VEX_LEN_0F381A_P_2_M_0,
   VEX_LEN_0F3836_P_2,
   VEX_LEN_0F3841_P_2,
+  VEX_LEN_0F3849_P_0_W_0_M_0,
+  VEX_LEN_0F3849_P_0_W_0_M_3_REG_0_RM_0,
+  VEX_LEN_0F3849_P_2_W_0_M_0,
+  VEX_LEN_0F3849_P_3_W_0_M_0,
+  VEX_LEN_0F384B_P_1_W_0_M_0,
+  VEX_LEN_0F384B_P_2_W_0_M_0,
+  VEX_LEN_0F384B_P_3_W_0_M_0,
   VEX_LEN_0F385A_P_2_M_0,
+  VEX_LEN_0F385C_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_0_W_0_M_0,
+  VEX_LEN_0F385E_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_2_W_0_M_0,
+  VEX_LEN_0F385E_P_3_W_0_M_0,
   VEX_LEN_0F38DB_P_2,
   VEX_LEN_0F38F2_P_0,
   VEX_LEN_0F38F3_R_1_P_0,
@@ -2006,9 +2055,20 @@ enum
   VEX_W_0F382F_P_2_M_0,
   VEX_W_0F3836_P_2,
   VEX_W_0F3846_P_2,
+  VEX_W_0F3849_P_0,
+  VEX_W_0F3849_P_2,
+  VEX_W_0F3849_P_3,
+  VEX_W_0F384B_P_1,
+  VEX_W_0F384B_P_2,
+  VEX_W_0F384B_P_3,
   VEX_W_0F3858_P_2,
   VEX_W_0F3859_P_2,
   VEX_W_0F385A_P_2_M_0,
+  VEX_W_0F385C_P_1,
+  VEX_W_0F385E_P_0,
+  VEX_W_0F385E_P_1,
+  VEX_W_0F385E_P_2,
+  VEX_W_0F385E_P_3,
   VEX_W_0F3878_P_2,
   VEX_W_0F3879_P_2,
   VEX_W_0F38CF_P_2,
@@ -3153,6 +3213,16 @@ static const char *att_names_zmm[] = {
   "%zmm28", "%zmm29", "%zmm30", "%zmm31"
 };
 
+static const char **names_tmm;
+static const char *intel_names_tmm[] = {
+  "tmm0", "tmm1", "tmm2", "tmm3",
+  "tmm4", "tmm5", "tmm6", "tmm7"
+};
+static const char *att_names_tmm[] = {
+  "%tmm0", "%tmm1", "%tmm2", "%tmm3",
+  "%tmm4", "%tmm5", "%tmm6", "%tmm7"
+};
+
 static const char **names_mask;
 static const char *intel_names_mask[] = {
   "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7"
@@ -3521,6 +3591,10 @@ static const struct dis386 reg_table[][8] = {
     { MOD_TABLE (MOD_VEX_0FAE_REG_2) },
     { MOD_TABLE (MOD_VEX_0FAE_REG_3) },
   },
+  /* REG_VEX_0F3849_P_0_W_0_M_3 */
+  {
+    { RM_TABLE (RM_VEX_0F3849_P_0_W_0_M_3_R_0) },
+  },
   /* REG_VEX_0F38F3 */
   {
     { Bad_Opcode },
@@ -5902,6 +5976,22 @@ static const struct dis386 prefix_table[][4] = {
     { "vpsllv%LW", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F3849 */
+  {
+    { VEX_W_TABLE (VEX_W_0F3849_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3849_P_2) },
+    { VEX_W_TABLE (VEX_W_0F3849_P_3) },
+  },
+
+  /* PREFIX_VEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384B_P_1) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_2) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_3) },
+  },
+
   /* PREFIX_VEX_0F3858 */
   {
     { Bad_Opcode },
@@ -5923,6 +6013,21 @@ static const struct dis386 prefix_table[][4] = {
     { MOD_TABLE (MOD_VEX_0F385A_PREFIX_2) },
   },
 
+  /* PREFIX_VEX_0F385C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F385C_P_1) },
+    { Bad_Opcode },
+  },
+
+  /* PREFIX_VEX_0F385E */
+  {
+    { VEX_W_TABLE (VEX_W_0F385E_P_0) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_1) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_2) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_3) },
+  },
+
   /* PREFIX_VEX_0F3878 */
   {
     { Bad_Opcode },
@@ -6938,6 +7043,78 @@ static const struct dis386 x86_64_table[][2] = {
     { "lidt{Q|Q}", { M }, 0 },
     { "lidt", { M }, 0 },
   },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "ldtilecfg", { EV }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_3_REG_0_RM_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilerelease", { Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "sttilecfg", { EV }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilezero", { XMT, Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilestored", { EV, XMT }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloaddt1", { XMT, EV }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloadd", { XMT, EV }, 0 },
+  },
+
+  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbf16ps", { XMT, EXtmm, Vextmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbuud", {XMT, EXtmm, Vextmm}, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbsud", {XMT, EXtmm, Vextmm}, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbusd", {XMT, EXtmm, Vextmm}, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbssd", {XMT, EXtmm, Vextmm}, 0 },
+  },
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8779,9 +8956,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3847) },
     /* 48 */
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -8800,9 +8977,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3859) },
     { PREFIX_TABLE (PREFIX_VEX_0F385A) },
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385C) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385E) },
     { Bad_Opcode },
     /* 60 */
     { Bad_Opcode },
@@ -9540,12 +9717,72 @@ static const struct dis386 vex_len_table[][2] = {
     { "vphminposuw",	{ XM, EXx }, 0 },
   },
 
+  /* VEX_LEN_0F3849_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_0_W_0_M_3_REG_0_RM_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_3_REG_0_RM_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_3_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F385A_P_2_M_0 */
   {
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F385A_P_2_M_0) },
   },
 
+  /* VEX_LEN_0F385C_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385C_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F38DB_P_2 */
   {
     { "vaesimc",	{ XM, EXx }, 0 },
@@ -10036,6 +10273,30 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3846_P_2 */
     { "vpsravd",	{ XM, Vex, EXx }, 0 },
   },
+  {
+    /* VEX_W_0F3849_P_0 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_2 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_3 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_1 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_2 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_3 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3858_P_2 */
     { "vpbroadcastd", { XM, EXxmm_md }, 0 },
@@ -10048,6 +10309,26 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F385A_P_2_M_0 */
     { "vbroadcasti128", { XM, Mxmm }, 0 },
   },
+  {
+    /* VEX_W_0F385C_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385C_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_0 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_2 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_3 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3878_P_2 */
     { "vpbroadcastb",	{ XM, EXxmm_mb }, 0 },
@@ -10474,6 +10755,57 @@ static const struct dis386 mod_table[][2] = {
     /* MOD_0F382A_PREFIX_2 */
     { "movntdqa",	{ XM, Mx }, 0 },
   },
+  {
+    /* MOD_VEX_0F3849_P_0_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_0) },
+    { REG_TABLE (REG_VEX_0F3849_P_0_W_0_M_3) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_1_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_3_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385C_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385C_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_0_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_0_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_2_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_3_W_0_M_0) },
+  },
   {
     /* MOD_0F38F5_PREFIX_2 */
     { "wrussK",		{ M, Gdq }, PREFIX_OPCODE },
@@ -11035,6 +11367,10 @@ static const struct dis386 rm_table[][8] = {
     { "sfence",		{ Skip_MODRM }, 0 },
 
   },
+  {
+    /* RM_VEX_0F3849_P_0_W_0_M_3_R_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_3_REG_0_RM_0) },
+  },
 };
 
 #define INTERNAL_DISASSEMBLER_ERROR _("<internal disassembler error>")
@@ -11926,6 +12262,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = intel_names_xmm;
       names_ymm = intel_names_ymm;
       names_zmm = intel_names_zmm;
+      names_tmm = intel_names_tmm;
       index64 = intel_index64;
       index32 = intel_index32;
       names_mask = intel_names_mask;
@@ -11948,6 +12285,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = att_names_xmm;
       names_ymm = att_names_ymm;
       names_zmm = att_names_zmm;
+      names_tmm = att_names_tmm;
       index64 = att_index64;
       index32 = att_index32;
       names_mask = att_names_mask;
@@ -13451,6 +13789,8 @@ intel_operand_size (int bytemode, int sizeflag)
     }
   switch (bytemode)
     {
+    case generic_mode:
+      break;
     case b_mode:
     case b_swap_mode:
     case dqb_mode:
@@ -15172,6 +15512,7 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != scalar_mode)
     {
       switch (vex.length)
@@ -15210,6 +15551,16 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
+
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15334,6 +15685,7 @@ OP_EX (int bytemode, int sizeflag)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != d_scalar_mode
       && bytemode != d_scalar_swap_mode
       && bytemode != q_scalar_mode
@@ -15371,6 +15723,15 @@ OP_EX (int bytemode, int sizeflag)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15926,6 +16287,17 @@ OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       return;
     }
 
+  if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      oappend (names_tmm[reg]);
+      return;
+    }
+
   switch (vex.length)
     {
     case 128:
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 7230f87344..3334155071 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -297,6 +297,12 @@ static initializer cpu_flag_init[] =
     "CpuWAITPKG" },
   { "CPU_CLDEMOTE_FLAGS",
     "CpuCLDEMOTE" },
+  { "CPU_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_MOVDIR64B_FLAGS",
@@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =
     "CpuAVX512_BITALG" },
   { "CPU_ANY_AVX512_BF16_FLAGS",
     "CpuAVX512_BF16" },
+  { "CPU_ANY_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_ANY_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_ANY_AMX_TILE_FLAGS",
+    "CpuAMX_TILE|CpuAMX_INT8|CpuAMX_BF16" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -459,6 +471,8 @@ static initializer operand_type_init[] =
     "Class=RegSIMD|Ymmword" },
   { "OPERAND_TYPE_REGZMM",
     "Class=RegSIMD|Zmmword" },
+  { "OPERAND_TYPE_REGTMM",
+    "Class=RegSIMD|Tmmword" },
   { "OPERAND_TYPE_REGMASK",
     "Class=RegMask" },
   { "OPERAND_TYPE_REGBND",
@@ -611,6 +625,9 @@ static bitfield cpu_flags[] =
   BITFIELD (CpuPCONFIG),
   BITFIELD (CpuWAITPKG),
   BITFIELD (CpuCLDEMOTE),
+  BITFIELD (CpuAMX_INT8),
+  BITFIELD (CpuAMX_BF16),
+  BITFIELD (CpuAMX_TILE),
   BITFIELD (CpuMOVDIRI),
   BITFIELD (CpuMOVDIR64B),
   BITFIELD (CpuENQCMD),
@@ -741,6 +758,7 @@ static bitfield operand_types[] =
   BITFIELD (Xmmword),
   BITFIELD (Ymmword),
   BITFIELD (Zmmword),
+  BITFIELD (Tmmword),
   BITFIELD (Unspecified),
 #ifdef OTUnused
   BITFIELD (OTUnused),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c65febbe81..b8a6dfc25c 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -223,6 +223,12 @@ enum
   /* CET instructions support required */
   CpuIBT,
   CpuSHSTK,
+  /* AMX-INT8 instructions required */
+  CpuAMX_INT8,
+  /* AMX-BF16 instructions required */
+  CpuAMX_BF16,
+  /* AMX-TILE instructions required */
+  CpuAMX_TILE,
   /* GFNI instructions required */
   CpuGFNI,
   /* VAES instructions required */
@@ -372,6 +378,9 @@ typedef union i386_cpu_flags
       unsigned int cpuptwrite:1;
       unsigned int cpuibt:1;
       unsigned int cpushstk:1;
+      unsigned int cpuamx_int8:1;
+      unsigned int cpuamx_bf16:1;
+      unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
       unsigned int cpuvpclmulqdq:1;
@@ -574,7 +583,9 @@ enum
 #define VECSIB128	1
 #define VECSIB256	2
 #define VECSIB512	3
+#define SIBMEM		4
   SIB,
+
   /* SSE to AVX support required */
   SSE2AVX,
   /* No AVX equivalent */
@@ -702,7 +713,7 @@ typedef struct i386_opcode_modifier
   unsigned int vexw:2;
   unsigned int vexopcode:3;
   unsigned int vexsources:2;
-  unsigned int sib:2;
+  unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
   unsigned int evex:3;
@@ -807,6 +818,8 @@ enum
   Ymmword,
   /* ZMMWORD size.  */
   Zmmword,
+  /* TMMWORD size.  */
+  Tmmword,
   /* Unspecified memory size.  */
   Unspecified,
 
@@ -851,6 +864,7 @@ typedef union i386_operand_type
       unsigned int xmmword:1;
       unsigned int ymmword:1;
       unsigned int zmmword:1;
+      unsigned int tmmword:1;
       unsigned int unspecified:1;
 #ifdef OTUnused
       unsigned int unused:(OTNumOfBits - OTUnused);
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index cd6833c5ae..2a8ec52b41 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -52,6 +52,7 @@
 #define RegXMM Class=RegSIMD|Xmmword
 #define RegYMM Class=RegSIMD|Ymmword
 #define RegZMM Class=RegSIMD|Zmmword
+#define RegTMM Class=RegSIMD|Tmmword
 
 #define RegMask Class=RegMask
 
@@ -88,6 +89,7 @@
 #define VecSIB128 SIB=VECSIB128
 #define VecSIB256 SIB=VECSIB256
 #define VecSIB512 SIB=VECSIB512
+#define Sibmem SIB=SIBMEM|Modrm
 
 #define EVex128 EVex=EVEX128
 #define EVex256 EVex=EVEX256
@@ -4093,3 +4095,24 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
 xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
 
 // TSXLDTRK instructions end.
+
+// AMX instructions.
+
+ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+
+tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+
+tileloadd, 2, 0xf24b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 2, 0x664b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 2, 0xf34b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
+
+tilerelease, 0, 0x49c0, None, 2, CpuAMX_TILE|Cpu64, Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
+
+tilezero, 1, 0xf249, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM }
+
+// AMX instructions end.
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index cdff763ca7..ca7eeba488 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval
 zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval
 zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval
 zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval
+// TMM registers for AMX
+tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval
+tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval
+tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval
+tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval
+tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval
+tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval
+tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval
+tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval
 // Bound registers for MPX
 bnd0, Class=RegBND, 0, 0, Dw2Inval, Dw2Inval
 bnd1, Class=RegBND, 0, 1, Dw2Inval, Dw2Inval
-- 
2.17.1

Thanks,
Lili.
Jan Beulich July 6, 2020, 12:35 p.m. | #29
On 06.07.2020 05:36, Cui, Lili wrote:
> @@ -10926,9 +10970,10 @@ i386_index_check (const char *operand_string)

>  		      || !i.index_reg->reg_type.bitfield.baseindex)))

>  	    goto bad_address;

>  

> -	  /* bndmk, bndldx, and bndstx have special restrictions. */

> +	  /* bndmk, bndldx, bndstx and mandatory non-vector SIB have special restrictions. */

>  	  if (current_templates->start->base_opcode == 0xf30f1b

> -	      || (current_templates->start->base_opcode & ~1) == 0x0f1a)

> +	      || (current_templates->start->base_opcode & ~1) == 0x0f1a

> +	      || current_templates->start->opcode_modifier.sib == SIBMEM)


With this in view, isn't it possible to ...

> @@ -10939,6 +10984,7 @@ i386_index_check (const char *operand_string)

>  

>  	      /* bndldx and bndstx ignore their scale factor. */

>  	      if (current_templates->start->base_opcode != 0xf30f1b

> +		  && current_templates->start->opcode_modifier.sib != SIBMEM

>  		  && i.log2_scale_factor)

>  		as_warn (_("register scaling is being ignored here"));


... check "... & ~1) == 0x0f1a" here, instead of adding a second
condition (resulting in overall more simple code)?

> --- a/gas/testsuite/gas/i386/i386.exp

> +++ b/gas/testsuite/gas/i386/i386.exp

> @@ -1137,6 +1137,9 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t

>      run_dump_test "x86-64-lfence-ret-d"

>      run_dump_test "x86-64-lfence-ret-e"

>      run_dump_test "x86-64-lfence-byte"

> +    run_list_test "x86-64-amx-inval"

> +    run_dump_test "x86-64-amx"

> +    run_dump_test "x86-64-amx-intel"


There still look to be tests missing here that were previously
requested (proving correct [failing] decode of AMX insns with
VEX.W, .L, .R, .B set or the high bit of VEX.VVVV clear; the
latter few of course only if you and H.J. continue to object to
the addition of %tmm8...%tmm15, which I've pointed out before
the specification doesn't exclude).

> @@ -6938,6 +7043,78 @@ static const struct dis386 x86_64_table[][2] = {

>      { "lidt{Q|Q}", { M }, 0 },

>      { "lidt", { M }, 0 },

>    },


With this even in context, ...

> +  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "ldtilecfg", { EV }, 0 },


... is there anything wrong to use M instead of the new EV here
and ...

> +  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "sttilecfg", { EV }, 0 },


... here (and again twice below)? New things should imo be
introduced only when there are no existing suitable items. If
you look at some of the cleanup I've been doing recently (and
more in the works), you'll notice how several pieces had been
introduced over time when there already was suitable logic
available.

> +  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tilezero", { XMT, Skip_MODRM }, 0 },


At the example of this - where does the "XMT" name come from?
Following existing XMM naming, I'd expect this to be either
TM (along the lines of XM) or TMM (along the lines of XMM).
(And again I can't help the impression that the two actually
perform the same thing.)

> +  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tdpbf16ps", { XMT, EXtmm, Vextmm }, 0 },


Along the lines of the above, EXt then (paralleling EXx)? For
the last operand here I'd suggest VexTmm or VexTMM.

> @@ -10474,6 +10755,57 @@ static const struct dis386 mod_table[][2] = {

>      /* MOD_0F382A_PREFIX_2 */

>      { "movntdqa",	{ XM, Mx }, 0 },

>    },

> +  {

> +    /* MOD_VEX_0F3849_P_0_W_0 */

> +    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_0) },

> +    { REG_TABLE (REG_VEX_0F3849_P_0_W_0_M_3) },


If you look at existing suffixes you'll find that we have _M_1
or _MOD_3, but not _M_3. Please try to conform to this. (_MOD_
should probably get replaced altogether.)

> @@ -13451,6 +13789,8 @@ intel_operand_size (int bytemode, int sizeflag)

>      }

>    switch (bytemode)

>      {

> +    case generic_mode:

> +      break;

>      case b_mode:


I don't see the need for this addition - the default case covers
it fine afaics.

Jan
Jan Beulich July 7, 2020, 6 a.m. | #30
On 06.07.2020 05:36, Cui, Lili wrote:
> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-amx-inval.s

> @@ -0,0 +1,14 @@

> +# Check for SIBMEM operand used in certain AMX instructions

> +

> +    .text

> +_start:

> +    tileloadd (%rip), %tmm1

> +    tileloaddt1 (%rip), %tmm1

> +    tilestored %tmm1, (%rip)

> +    tdpbssd %xmm1, %xmm2, %xmm3


You also want to check that errors result for any of

    tdpbssd %tmm1, %tmm1, %tmm0
    tdpbssd %tmm1, %tmm0, %tmm1
    tdpbssd %tmm0, %tmm1, %tmm1

(albeit I have to admit I don't really understand the first of the
restrictions - at least for square matrices both sources being the
same registers ought to be fine imo). Which in turn points out
that even right now the comment at the top of the file is already
stale.

I'd also recommend adding something like

    vaddps %tmm1, %tmm2, %tmm3

here or somewhere else; after all this is what I did suspect
might have passed without an error in the earlier version of the
patch.

Jan
Alan Modra via Binutils July 7, 2020, 8:19 a.m. | #31
Hi Jan,

Really thank you for so many good suggestions, I updated AMX patch.

>> +  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */  {

>> +    { Bad_Opcode },

>> +    { "tdpbf16ps", { XMT, EXtmm, Vextmm }, 0 },


>Along the lines of the above, EXt then (paralleling EXx)? For the last operand here I'd suggest VexTmm or VexTMM.


EXtmm paralleling with EXxmm and EXymm,  I took all your suggestions and revised AMX patch.


Subject: [PATCH] x86: Add support for Intel AMX instructions

gas/
	* doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.
	* config/tc-i386.c (i386_error): Add invalid_sib_address.
	(cpu_arch): Add .amx_int8, .amx_bf16 and .amx_tile.
	(cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.
	(match_simd_size): Add tmmword check.
	(operand_type_match): Add tmmword.
	(type_names): Add rTMM.
	(check_VecOperands): Handle invalid_sib_address.
	(match_template): Handle invalid_sib_address.
	(build_modrm_byte): Handle non-vector SIB and zmmword.
	(i386_index_check): Disallow RegIP for non-vector SIB.
	(check_register): Handle zmmword.
	* testsuite/gas/i386/i386.exp: Add AMX new tests.
	* testsuite/gas/i386/intel-regs.d: Add tmm.
	* testsuite/gas/i386/intel-regs.s: Add tmm.
	* testsuite/gas/i386/x86-64-amx-intel.d: New.
	* testsuite/gas/i386/x86-64-amx-inval.l: New.
	* testsuite/gas/i386/x86-64-amx-inval.s: New.
	* testsuite/gas/i386/x86-64-amx.d: New.
	* testsuite/gas/i386/x86-64-amx.s: New.
	* testsuite/gas/i386/x86-64-amx-bad.d: New.
	* testsuite/gas/i386/x86-64-amx-bad.s: New.

opcodes/
	* i386-dis.c (TMM): New.
	(EXtmm): Likewise.
	(VexTmm): Likewise.
	(tmm_mode): Likewise.
	(REG_VEX_0F3849_P_0_W_0_M_1): Likewise.
	(MOD_VEX_0F3849_P_0_W_0): Likewise.
	(MOD_VEX_0F3849_P_2_W_0): Likewise.
	(MOD_VEX_0F3849_P_3_W_0): Likewise.
	(MOD_VEX_0F384B_P_1_W_0): Likewise.
	(MOD_VEX_0F384B_P_2_W_0): Likewise.
	(MOD_VEX_0F384B_P_3_W_0): Likewise.
	(MOD_VEX_0F385C_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_0_W_0): Likewise.
	(MOD_VEX_0F385E_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_2_W_0): Likewise.
	(MOD_VEX_0F385E_P_3_W_0): Likewise.
	(RM_VEX_0F3849_P_0_W_0_M_1_R_0): Likewise.
	(PREFIX_VEX_0F3849): Likewise.
	(PREFIX_VEX_0F384B): Likewise.
	(PREFIX_VEX_0F385C): Likewise.
	(PREFIX_VEX_0F385E): Likewise.
	(X86_64_0F01_REG_3): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385C_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_3_W_0_M_0_L_0): Likewise.
	(VEX_W_0F3849_P_0): Likewise.
	(VEX_W_0F3849_P_2): Likewise.
	(VEX_W_0F3849_P_3): Likewise.
	(VEX_W_0F384B_P_1): Likewise.
	(VEX_W_0F384B_P_2): Likewise.
	(VEX_W_0F384B_P_3): Likewise.
	(VEX_W_0F385C_P_1): Likewise.
	(VEX_W_0F385E_P_0): Likewise.
	(VEX_W_0F385E_P_1): Likewise.
	(VEX_W_0F385E_P_2): Likewise.
	(VEX_W_0F385E_P_3): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0): Likewise.
	(VEX_LEN_0F3849_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F385C_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_3_W_0_M_0): Likewise.
	(names_tmm): Likewise.
	(att_names_tmm): Likewise.
	(intel_operand_size): Handle void_mode.
	(OP_XMM): Handle tmm_mode.
	(OP_EX): Likewise.
	(OP_VEX): Likewise.
	* i386-gen.c (cpu_flag_init): Add entries for
	CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(operand_type_shorthands): Add RegTMM.
	(operand_type_init): Likewise.
	(operand_types): Add Tmmword.
	(cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	* i386-opc.h (CpuAMX_INT8): New.
	(CpuAMX_BF16): Likewise.
	(CpuAMX_TILE): Likewise.
	(SIBMEM): Likewise.
	(Tmmword): Likewise.
	(i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.
	(i386_opcode_modifier): Extend width of fields vexvvvv and sib.
	(i386_operand_type): Add tmmword.
	* i386-opc.tbl: Add AMX instructions.
	* i386-reg.tbl: Add AMX registers.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                      |  76 ++++-
 gas/doc/c-i386.texi                       |   7 +
 gas/testsuite/gas/i386/i386.exp           |   4 +
 gas/testsuite/gas/i386/intel-regs.d       |   4 +
 gas/testsuite/gas/i386/intel-regs.s       |   4 +
 gas/testsuite/gas/i386/x86-64-amx-bad.d   |  18 ++
 gas/testsuite/gas/i386/x86-64-amx-bad.s   |  28 ++
 gas/testsuite/gas/i386/x86-64-amx-intel.d |  76 +++++
 gas/testsuite/gas/i386/x86-64-amx-inval.l |  11 +
 gas/testsuite/gas/i386/x86-64-amx-inval.s |  16 +
 gas/testsuite/gas/i386/x86-64-amx.d       |  76 +++++
 gas/testsuite/gas/i386/x86-64-amx.s       |  67 ++++
 opcodes/i386-dis.c                        | 377 +++++++++++++++++++++-
 opcodes/i386-gen.c                        |  18 ++
 opcodes/i386-opc.h                        |  16 +-
 opcodes/i386-opc.tbl                      |  23 ++
 opcodes/i386-reg.tbl                      |   9 +
 17 files changed, 811 insertions(+), 19 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 2e0eb24753..ea39fcab87 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -290,6 +290,7 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
     unsupported_vector_index_register,
@@ -372,6 +373,9 @@ struct _i386_insn
     /* Has ZMM register operands.  */
     bfd_boolean has_regzmm;
 
+    /* Has TMM register operands.  */
+    bfd_boolean has_regtmm;
+
     /* Has GOTPC or TLS relocation.  */
     bfd_boolean has_gotpc_tls_reloc;
 
@@ -1201,6 +1205,12 @@ static const arch_entry cpu_arch[] =
     CPU_WAITPKG_FLAGS, 0 },
   { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,
     CPU_CLDEMOTE_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_int8"), PROCESSOR_UNKNOWN,
+    CPU_AMX_INT8_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_bf16"), PROCESSOR_UNKNOWN,
+    CPU_AMX_BF16_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_tile"), PROCESSOR_UNKNOWN,
+    CPU_AMX_TILE_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdiri"), PROCESSOR_UNKNOWN,
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
@@ -1259,6 +1269,9 @@ static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },
   { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
+  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },
+  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },
+  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
   { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
@@ -2159,7 +2172,9 @@ match_simd_size (const insn_template *t, unsigned int wanted,
 	   || (i.types[given].bitfield.ymmword
 	       && !t->operand_types[wanted].bitfield.ymmword)
 	   || (i.types[given].bitfield.zmmword
-	       && !t->operand_types[wanted].bitfield.zmmword));
+	       && !t->operand_types[wanted].bitfield.zmmword)
+	   || (i.types[given].bitfield.tmmword
+	       && !t->operand_types[wanted].bitfield.tmmword));
 }
 
 /* Return 1 if there is no conflict in any size between operand GIVEN
@@ -2296,6 +2311,7 @@ operand_type_match (i386_operand_type overlap,
   temp.bitfield.xmmword = 0;
   temp.bitfield.ymmword = 0;
   temp.bitfield.zmmword = 0;
+  temp.bitfield.tmmword = 0;
   if (operand_type_all_zero (&temp))
     goto mismatch;
 
@@ -3304,6 +3320,7 @@ const type_names[] =
   { OPERAND_TYPE_REGXMM, "rXMM" },
   { OPERAND_TYPE_REGYMM, "rYMM" },
   { OPERAND_TYPE_REGZMM, "rZMM" },
+  { OPERAND_TYPE_REGTMM, "rTMM" },
   { OPERAND_TYPE_REGMASK, "Mask reg" },
 };
 
@@ -5790,7 +5807,7 @@ check_VecOperands (const insn_template *t)
 
   /* For VSIB byte, we need a vector register for index, and all vector
      registers must be distinct.  */
-  if (t->opcode_modifier.sib)
+  if (t->opcode_modifier.sib && t->opcode_modifier.sib != SIBMEM)
     {
       if (!i.index_reg
 	  || !((t->opcode_modifier.sib == VECSIB128
@@ -6584,6 +6601,9 @@ match_template (char mnem_suffix)
 	  as_bad (_("unsupported instruction `%s'"),
 		  current_templates->start->name);
 	  return NULL;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
 	case invalid_vsib_address:
 	  err_msg = _("invalid VSIB address");
 	  break;
@@ -7923,8 +7943,11 @@ build_modrm_byte (void)
 	  else if (i.op[dest].regs->reg_type.bitfield.class == RegSIMD
 		   || i.op[source].regs->reg_type.bitfield.class == RegSIMD)
 	    {
-	      if (i.types[dest].bitfield.zmmword
-		  || i.types[source].bitfield.zmmword)
+	      if (i.types[dest].bitfield.tmmword
+		  || i.types[source].bitfield.tmmword)
+		i.has_regtmm = TRUE;
+	      else if (i.types[dest].bitfield.zmmword
+		       || i.types[source].bitfield.zmmword)
 		i.has_regzmm = TRUE;
 	      else if (i.types[dest].bitfield.ymmword
 		       || i.types[source].bitfield.ymmword)
@@ -7966,7 +7989,9 @@ build_modrm_byte (void)
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
-	      if (i.index_reg->reg_num == RegIZ)
+	      /* The index register of VSIB shouldn't be RegIZ.  */
+	      if (i.tm.opcode_modifier.sib != SIBMEM
+		  && i.index_reg->reg_num == RegIZ)
 		abort ();
 
 	      i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
@@ -7989,8 +8014,19 @@ build_modrm_byte (void)
 		      i.types[op].bitfield.disp32s = 1;
 		    }
 		}
-	      i.sib.index = i.index_reg->reg_num;
-	      set_rex_vrex (i.index_reg, REX_X, FALSE);
+
+	      /* Since the mandatory SIB always has index register, so
+		 the code logic remains unchanged. The non-mandatory SIB
+		 without index register is allowed and will be handled
+		 later.  */
+	      if (i.index_reg)
+		{
+		  if (i.index_reg->reg_num == RegIZ)
+		    i.sib.index = NO_INDEX_REGISTER;
+		  else
+		    i.sib.index = i.index_reg->reg_num;
+		  set_rex_vrex (i.index_reg, REX_X, FALSE);
+		}
 	    }
 
 	  default_seg = &ds;
@@ -8004,7 +8040,9 @@ build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Both check for VSIB and mandatory non-vector SIB. */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
 		  /* Operand is just <disp>  */
 		  if (flag_code == CODE_64BIT)
 		    {
@@ -8142,7 +8180,11 @@ build_modrm_byte (void)
 	      i.sib.scale = i.log2_scale_factor;
 	      if (i.index_reg == 0)
 		{
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB. */
+		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128
+			      && i.tm.opcode_modifier.sib != VECSIB256
+			      && i.tm.opcode_modifier.sib != VECSIB512);
+
 		  /* <disp>(%esp) becomes two byte modrm with no index
 		     register.  We've already stored the code for esp
 		     in i.rm.regmem ie. ESCAPE_TO_TWO_BYTE_ADDRESSING.
@@ -8267,7 +8309,9 @@ build_modrm_byte (void)
 		break;
 	      if (i.types[op].bitfield.class == RegSIMD)
 		{
-		  if (i.types[op].bitfield.zmmword)
+		  if (i.types[op].bitfield.tmmword)
+		    i.has_regtmm = TRUE;
+		  else if (i.types[op].bitfield.zmmword)
 		    i.has_regzmm = TRUE;
 		  else if (i.types[op].bitfield.ymmword)
 		    i.has_regymm = TRUE;
@@ -10926,9 +10970,10 @@ i386_index_check (const char *operand_string)
 		      || !i.index_reg->reg_type.bitfield.baseindex)))
 	    goto bad_address;
 
-	  /* bndmk, bndldx, and bndstx have special restrictions. */
+	  /* bndmk, bndldx, bndstx and mandatory non-vector SIB have special restrictions. */
 	  if (current_templates->start->base_opcode == 0xf30f1b
-	      || (current_templates->start->base_opcode & ~1) == 0x0f1a)
+	      || (current_templates->start->base_opcode & ~1) == 0x0f1a
+	      || current_templates->start->opcode_modifier.sib == SIBMEM)
 	    {
 	      /* They cannot use RIP-relative addressing. */
 	      if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -10938,7 +10983,7 @@ i386_index_check (const char *operand_string)
 		}
 
 	      /* bndldx and bndstx ignore their scale factor. */
-	      if (current_templates->start->base_opcode != 0xf30f1b
+	      if ((current_templates->start->base_opcode & ~1) == 0x0f1a
 		  && i.log2_scale_factor)
 		as_warn (_("register scaling is being ignored here"));
 	    }
@@ -12440,6 +12485,11 @@ static bfd_boolean check_register (const reg_entry *r)
 	}
     }
 
+  if (r->reg_type.bitfield.tmmword
+      && (!cpu_arch_flags.bitfield.cpuamx_tile
+          || flag_code != CODE_64BIT))
+    return FALSE;
+
   if (r->reg_type.bitfield.class == RegBND && !cpu_arch_flags.bitfield.cpumpx)
     return FALSE;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index d4e6fcb698..cb86cc7968 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
+@code{amx_int8},
+@code{noamx_int8},
+@code{amx_bf16},
+@code{noamx_bf16},
+@code{amx_tile},
+@code{noamx_tile},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1504,6 +1510,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
+@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 55929d3acb..bd4adb07ef 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -1137,6 +1137,10 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"
+    run_list_test "x86-64-amx-inval"
+    run_dump_test "x86-64-amx"
+    run_dump_test "x86-64-amx-intel"
+    run_dump_test "x86-64-amx-bad"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/intel-regs.d b/gas/testsuite/gas/i386/intel-regs.d
index 65bcb6ca7d..480b291c91 100644
--- a/gas/testsuite/gas/i386/intel-regs.d
+++ b/gas/testsuite/gas/i386/intel-regs.d
@@ -6,6 +6,7 @@
 
 Disassembly of section \.text:
 0+0 <.*>:
+.*[ 	]+R_386_32[ 	]+tmm1
 .*[ 	]+R_386_16[ 	]+eax
 .*[ 	]+R_386_16[ 	]+rax
 .*[ 	]+R_386_16[ 	]+axl
@@ -53,4 +54,7 @@ Disassembly of section \.text:
 
 .* <ymm8>:
 .*[ 	]+<ymm8>
+
+.* <tmm0>:
+.*[ 	]+<tmm0>
 #pass
diff --git a/gas/testsuite/gas/i386/intel-regs.s b/gas/testsuite/gas/i386/intel-regs.s
index 66ab16dfc5..44e369bb0f 100644
--- a/gas/testsuite/gas/i386/intel-regs.s
+++ b/gas/testsuite/gas/i386/intel-regs.s
@@ -1,6 +1,8 @@
 	.text
 	.intel_syntax noprefix
 
+	mov	eax, tmm1
+
 	.arch i286
 	.code16
 	mov	ax, eax			; add	[bx+si], al
@@ -59,3 +61,5 @@
 	mov	rax, r8
 ymm8:
 	jmp	ymm8
+tmm0:
+	jmp	tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.d b/gas/testsuite/gas/i386/x86-64-amx-bad.d
new file mode 100644
index 0000000000..dd4b607d42
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.d
@@ -0,0 +1,18 @@
+#as:
+#objdump: -drw
+#name: x86_64 AMX insns
+#source: x86-64-amx-bad.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <\.text>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 d2 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 56 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 62 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 c2 52 5c dc[ 	]*tdpbf16ps %tmm5,\(bad\),%tmm3
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.s b/gas/testsuite/gas/i386/x86-64-amx-bad.s
new file mode 100644
index 0000000000..9c2fa22cc5
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s
@@ -0,0 +1,28 @@
+.text
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0xd2
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.L = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x56
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.R = 0 (illegal value).
+	.byte 0xc4
+	.byte 0x62
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.B = 0 (illegal value).
+	.byte 0xc4
+	.byte 0xc2
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+
diff --git a/gas/testsuite/gas/i386/x86-64-amx-intel.d b/gas/testsuite/gas/i386/x86-64-amx-intel.d
new file mode 100644
index 0000000000..4422ac6946
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d
@@ -0,0 +1,76 @@
+#as:
+#objdump: -d -Mintel
+#name: x86_64 AMX insns in Intel syntax
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd tmm0,tmm1,tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd tmm1,tmm0,tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd tmm1,tmm1,tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd tmm0,tmm1,tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd tmm1,tmm0,tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd tmm1,tmm1,tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.l b/gas/testsuite/gas/i386/x86-64-amx-inval.l
new file mode 100644
index 0000000000..adcffadc5b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.l
@@ -0,0 +1,11 @@
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: operand size mismatch for `tdpbssd'
+.*:9: Error: operand size mismatch for `vaddps'
+.*:12: Error: `\[rip\]' cannot be used here
+.*:13: Error: `\[rip\]' cannot be used here
+.*:14: Error: `\[rip\]' cannot be used here
+.*:15: Error: operand size mismatch for `tdpbssd'
+.*:16: Error: operand size mismatch for `vaddps'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.s b/gas/testsuite/gas/i386/x86-64-amx-inval.s
new file mode 100644
index 0000000000..6079767048
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.s
@@ -0,0 +1,16 @@
+# Check illegal SIBMEM and register size used in AMX instructions
+
+    .text
+_start:
+    tileloadd (%rip), %tmm1
+    tileloaddt1 (%rip), %tmm1
+    tilestored %tmm1, (%rip)
+    tdpbssd %xmm1, %xmm2, %xmm3
+    vaddps %tmm1, %tmm2, %tmm3
+
+    .intel_syntax noprefix
+    tileloadd tmm1, [rip]
+    tileloaddt1 tmm1, [rip]
+    tilestored [rip], tmm1
+    tdpbssd xmm3, xmm2, xmm1
+    vaddps %tmm1, %tmm2, %tmm3
diff --git a/gas/testsuite/gas/i386/x86-64-amx.d b/gas/testsuite/gas/i386/x86-64-amx.d
new file mode 100644
index 0000000000..effb152f9c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.d
@@ -0,0 +1,76 @@
+#as:
+#objdump: -d
+#name: x86_64 AMX insns
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd %tmm1,%tmm1,%tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd %tmm1,%tmm0,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd %tmm0,%tmm1,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd %tmm1,%tmm1,%tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd %tmm1,%tmm0,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd %tmm0,%tmm1,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx.s b/gas/testsuite/gas/i386/x86-64-amx.s
new file mode 100644
index 0000000000..ab7682bcfc
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.s
@@ -0,0 +1,67 @@
+
+  .allow_index_reg
+  .text
+_start:
+  ldtilecfg  (%rcx,%rdx,2)
+  sttilecfg  (%rcx,%rdx,2)
+  tdpbf16ps %tmm5, %tmm4, %tmm3
+  tdpbssd %tmm3, %tmm2, %tmm1
+  tdpbssd %tmm1, %tmm1, %tmm0
+  tdpbssd %tmm1, %tmm0, %tmm1
+  tdpbssd %tmm0, %tmm1, %tmm1
+  tdpbsud %tmm3, %tmm2, %tmm1
+  tdpbusd %tmm3, %tmm2, %tmm1
+  tdpbuud %tmm3, %tmm2, %tmm1
+  tileloadd foo, %tmm5
+  tileloadd (%rcx), %tmm5
+  tileloadd (%ecx), %tmm5
+  tileloadd (%rcx,%rdx,1), %tmm5
+  tileloadd (%ecx,%edx,2), %tmm1
+  tileloaddt1 foo, %tmm5
+  tileloaddt1 (%rcx), %tmm5
+  tileloaddt1 (%ecx), %tmm5
+  tileloaddt1 (%rcx,%rdx,1), %tmm5
+  tileloaddt1 (%ecx,%edx,2), %tmm1
+  tileloaddt1 (%rcx,%riz,2), %tmm1
+  tilerelease
+  tilestored %tmm5, (%rcx)
+  tilestored %tmm5, (%ecx)
+  tilestored %tmm5, (%rcx,%rdx,1)
+  tilestored %tmm1, (%ecx,%edx,2)
+  tilezero %tmm0
+  tilezero %tmm5
+  tilezero %tmm7
+
+
+  .intel_syntax noprefix
+  ldtilecfg  [rcx]
+  ldtilecfg  [rbx]
+  sttilecfg  [rcx]
+  sttilecfg  [rbx]
+  tdpbf16ps tmm3, tmm4, tmm5
+  tdpbssd tmm1, tmm2, tmm3
+  tdpbssd tmm0, tmm1, tmm1
+  tdpbssd tmm1, tmm0, tmm1
+  tdpbssd tmm1, tmm1, tmm0
+  tdpbsud tmm1, tmm2, tmm3
+  tdpbusd tmm1, tmm2, tmm3
+  tdpbuud tmm1, tmm2, tmm3
+  tileloadd tmm5, foo
+  tileloadd tmm5, [rcx]
+  tileloadd tmm5, [ecx]
+  tileloadd tmm5, [rcx+rdx]
+  tileloadd tmm1, [ecx+edx*2]
+  tileloaddt1 tmm5, foo
+  tileloaddt1 tmm5, [rcx]
+  tileloaddt1 tmm5, [ecx]
+  tileloaddt1 tmm5, [rcx+rdx]
+  tileloaddt1 tmm1, [ecx+edx*2]
+  tileloaddt1 tmm1, [rcx+riz*2]
+  tilerelease
+  tilestored [rcx], tmm5
+  tilestored [ecx], tmm5
+  tilestored [rcx+rdx], tmm5
+  tilestored [ecx+edx*2], tmm1
+  tilezero tmm0
+  tilezero tmm5
+  tilezero tmm7
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 956e2c3539..2905d62e25 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -375,6 +375,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define XMScalar { OP_XMM, scalar_mode }
 #define XMGatherQ { OP_XMM, vex_vsib_q_w_dq_mode }
 #define XMM { OP_XMM, xmm_mode }
+#define TMM { OP_XMM, tmm_mode }
 #define XMxmmq { OP_XMM, xmmq_mode }
 #define EM { OP_EM, v_mode }
 #define EMS { OP_EM, v_swap_mode }
@@ -391,6 +392,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
+#define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
@@ -421,6 +423,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Vex128 { OP_VEX, vex128_mode }
 #define Vex256 { OP_VEX, vex256_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexTmm { OP_VEX, tmm_mode }
 #define EXdVexScalarS { OP_EX_Vex, d_scalar_swap_mode }
 #define EXqVexScalarS { OP_EX_Vex, q_scalar_swap_mode }
 #define EXVexW { OP_EX_VexW, x_mode }
@@ -542,6 +545,8 @@ enum
   ymmq_mode,
   /* 32-byte YMM or 16-byte word operand */
   ymmxmm_mode,
+  /* TMM operand */
+  tmm_mode,
   /* d_mode in 32bit, q_mode in 64bit mode.  */
   m_mode,
   /* pair of v_mode operands */
@@ -743,6 +748,7 @@ enum
   REG_VEX_0F72,
   REG_VEX_0F73,
   REG_VEX_0FAE,
+  REG_VEX_0F3849_P_0_W_0_M_1,
   REG_VEX_0F38F3,
   REG_XOP_LWPCB,
   REG_XOP_LWP,
@@ -826,6 +832,17 @@ enum
   MOD_0FE7_PREFIX_2,
   MOD_0FF0_PREFIX_3,
   MOD_0F382A_PREFIX_2,
+  MOD_VEX_0F3849_P_0_W_0,
+  MOD_VEX_0F3849_P_2_W_0,
+  MOD_VEX_0F3849_P_3_W_0,
+  MOD_VEX_0F384B_P_1_W_0,
+  MOD_VEX_0F384B_P_2_W_0,
+  MOD_VEX_0F384B_P_3_W_0,
+  MOD_VEX_0F385C_P_1_W_0,
+  MOD_VEX_0F385E_P_0_W_0,
+  MOD_VEX_0F385E_P_1_W_0,
+  MOD_VEX_0F385E_P_2_W_0,
+  MOD_VEX_0F385E_P_3_W_0,
   MOD_0F38F5_PREFIX_2,
   MOD_0F38F6_PREFIX_0,
   MOD_0F38F8_PREFIX_1,
@@ -963,6 +980,7 @@ enum
   RM_0F1E_P_1_MOD_3_REG_7,
   RM_0FAE_REG_6_MOD_3_P_0,
   RM_0FAE_REG_7_MOD_3,
+  RM_VEX_0F3849_P_0_W_0_M_1_R_0
 };
 
 enum
@@ -1298,9 +1316,13 @@ enum
   PREFIX_VEX_0F3845,
   PREFIX_VEX_0F3846,
   PREFIX_VEX_0F3847,
+  PREFIX_VEX_0F3849,
+  PREFIX_VEX_0F384B,
   PREFIX_VEX_0F3858,
   PREFIX_VEX_0F3859,
   PREFIX_VEX_0F385A,
+  PREFIX_VEX_0F385C,
+  PREFIX_VEX_0F385E,
   PREFIX_VEX_0F3878,
   PREFIX_VEX_0F3879,
   PREFIX_VEX_0F388C,
@@ -1673,7 +1695,19 @@ enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_2,
-  X86_64_0F01_REG_3
+  X86_64_0F01_REG_3,
+  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0,
+  X86_64_VEX_0F3849_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F385C_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_3_W_0_M_0_L_0
 };
 
 enum
@@ -1758,7 +1792,19 @@ enum
   VEX_LEN_0F381A_P_2_M_0,
   VEX_LEN_0F3836_P_2,
   VEX_LEN_0F3841_P_2,
+  VEX_LEN_0F3849_P_0_W_0_M_0,
+  VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0,
+  VEX_LEN_0F3849_P_2_W_0_M_0,
+  VEX_LEN_0F3849_P_3_W_0_M_0,
+  VEX_LEN_0F384B_P_1_W_0_M_0,
+  VEX_LEN_0F384B_P_2_W_0_M_0,
+  VEX_LEN_0F384B_P_3_W_0_M_0,
   VEX_LEN_0F385A_P_2_M_0,
+  VEX_LEN_0F385C_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_0_W_0_M_0,
+  VEX_LEN_0F385E_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_2_W_0_M_0,
+  VEX_LEN_0F385E_P_3_W_0_M_0,
   VEX_LEN_0F38DB_P_2,
   VEX_LEN_0F38F2_P_0,
   VEX_LEN_0F38F3_R_1_P_0,
@@ -1926,9 +1972,20 @@ enum
   VEX_W_0F382F_P_2_M_0,
   VEX_W_0F3836_P_2,
   VEX_W_0F3846_P_2,
+  VEX_W_0F3849_P_0,
+  VEX_W_0F3849_P_2,
+  VEX_W_0F3849_P_3,
+  VEX_W_0F384B_P_1,
+  VEX_W_0F384B_P_2,
+  VEX_W_0F384B_P_3,
   VEX_W_0F3858_P_2,
   VEX_W_0F3859_P_2,
   VEX_W_0F385A_P_2_M_0,
+  VEX_W_0F385C_P_1,
+  VEX_W_0F385E_P_0,
+  VEX_W_0F385E_P_1,
+  VEX_W_0F385E_P_2,
+  VEX_W_0F385E_P_3,
   VEX_W_0F3878_P_2,
   VEX_W_0F3879_P_2,
   VEX_W_0F38CF_P_2,
@@ -3045,6 +3102,16 @@ static const char *att_names_zmm[] = {
   "%zmm28", "%zmm29", "%zmm30", "%zmm31"
 };
 
+static const char **names_tmm;
+static const char *intel_names_tmm[] = {
+  "tmm0", "tmm1", "tmm2", "tmm3",
+  "tmm4", "tmm5", "tmm6", "tmm7"
+};
+static const char *att_names_tmm[] = {
+  "%tmm0", "%tmm1", "%tmm2", "%tmm3",
+  "%tmm4", "%tmm5", "%tmm6", "%tmm7"
+};
+
 static const char **names_mask;
 static const char *intel_names_mask[] = {
   "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7"
@@ -3413,6 +3480,10 @@ static const struct dis386 reg_table[][8] = {
     { MOD_TABLE (MOD_VEX_0FAE_REG_2) },
     { MOD_TABLE (MOD_VEX_0FAE_REG_3) },
   },
+  /* REG_VEX_0F3849_P_0_W_0_M_1 */
+  {
+    { RM_TABLE (RM_VEX_0F3849_P_0_W_0_M_1_R_0) },
+  },
   /* REG_VEX_0F38F3 */
   {
     { Bad_Opcode },
@@ -5794,6 +5865,22 @@ static const struct dis386 prefix_table[][4] = {
     { "vpsllv%LW", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F3849 */
+  {
+    { VEX_W_TABLE (VEX_W_0F3849_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3849_P_2) },
+    { VEX_W_TABLE (VEX_W_0F3849_P_3) },
+  },
+
+  /* PREFIX_VEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384B_P_1) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_2) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_3) },
+  },
+
   /* PREFIX_VEX_0F3858 */
   {
     { Bad_Opcode },
@@ -5815,6 +5902,21 @@ static const struct dis386 prefix_table[][4] = {
     { MOD_TABLE (MOD_VEX_0F385A_PREFIX_2) },
   },
 
+  /* PREFIX_VEX_0F385C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F385C_P_1) },
+    { Bad_Opcode },
+  },
+
+  /* PREFIX_VEX_0F385E */
+  {
+    { VEX_W_TABLE (VEX_W_0F385E_P_0) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_1) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_2) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_3) },
+  },
+
   /* PREFIX_VEX_0F3878 */
   {
     { Bad_Opcode },
@@ -6830,6 +6932,78 @@ static const struct dis386 x86_64_table[][2] = {
     { "lidt{Q|Q}", { M }, 0 },
     { "lidt", { M }, 0 },
   },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "ldtilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilerelease", { Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "sttilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilezero", { TMM, Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilestored", { M, TMM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloaddt1", { TMM, M }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloadd", { TMM, M }, 0 },
+  },
+
+  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbf16ps", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbuud", {TMM, EXtmm, VexTmm}, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbsud", {TMM, EXtmm, VexTmm}, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbusd", {TMM, EXtmm, VexTmm}, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbssd", {TMM, EXtmm, VexTmm}, 0 },
+  },
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8671,9 +8845,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3847) },
     /* 48 */
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -8692,9 +8866,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3859) },
     { PREFIX_TABLE (PREFIX_VEX_0F385A) },
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385C) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385E) },
     { Bad_Opcode },
     /* 60 */
     { Bad_Opcode },
@@ -9432,12 +9606,72 @@ static const struct dis386 vex_len_table[][2] = {
     { "vphminposuw",	{ XM, EXx }, 0 },
   },
 
+  /* VEX_LEN_0F3849_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_3_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F385A_P_2_M_0 */
   {
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F385A_P_2_M_0) },
   },
 
+  /* VEX_LEN_0F385C_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385C_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F38DB_P_2 */
   {
     { "vaesimc",	{ XM, EXx }, 0 },
@@ -9930,6 +10164,30 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3846_P_2 */
     { "vpsravd",	{ XM, Vex, EXx }, 0 },
   },
+  {
+    /* VEX_W_0F3849_P_0 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_2 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_3 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_1 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_2 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_3 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3858_P_2 */
     { "vpbroadcastd", { XM, EXxmm_md }, 0 },
@@ -9942,6 +10200,26 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F385A_P_2_M_0 */
     { "vbroadcasti128", { XM, Mxmm }, 0 },
   },
+  {
+    /* VEX_W_0F385C_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385C_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_0 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_2 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_3 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3878_P_2 */
     { "vpbroadcastb",	{ XM, EXxmm_mb }, 0 },
@@ -10388,6 +10666,57 @@ static const struct dis386 mod_table[][2] = {
     /* MOD_0F382A_PREFIX_2 */
     { "movntdqa",	{ XM, Mx }, 0 },
   },
+  {
+    /* MOD_VEX_0F3849_P_0_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_0) },
+    { REG_TABLE (REG_VEX_0F3849_P_0_W_0_M_1) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_1_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_3_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385C_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385C_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_0_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_0_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_2_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_3_W_0_M_0) },
+  },
   {
     /* MOD_0F38F5_PREFIX_2 */
     { "wrussK",		{ M, Gdq }, PREFIX_OPCODE },
@@ -10949,6 +11278,10 @@ static const struct dis386 rm_table[][8] = {
     { "sfence",		{ Skip_MODRM }, 0 },
 
   },
+  {
+    /* RM_VEX_0F3849_P_0_W_0_M_1_R_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0) },
+  },
 };
 
 #define INTERNAL_DISASSEMBLER_ERROR _("<internal disassembler error>")
@@ -11845,6 +12178,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = intel_names_xmm;
       names_ymm = intel_names_ymm;
       names_zmm = intel_names_zmm;
+      names_tmm = intel_names_tmm;
       index64 = intel_index64;
       index32 = intel_index32;
       names_mask = intel_names_mask;
@@ -11867,6 +12201,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = att_names_xmm;
       names_ymm = att_names_ymm;
       names_zmm = att_names_zmm;
+      names_tmm = att_names_tmm;
       index64 = att_index64;
       index32 = att_index32;
       names_mask = att_names_mask;
@@ -15050,6 +15385,7 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != scalar_mode)
     {
       switch (vex.length)
@@ -15088,6 +15424,16 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
+
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15212,6 +15558,7 @@ OP_EX (int bytemode, int sizeflag)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != d_scalar_swap_mode
       && bytemode != q_scalar_swap_mode
       && bytemode != vex_scalar_w_dq_mode)
@@ -15247,6 +15594,15 @@ OP_EX (int bytemode, int sizeflag)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15802,6 +16158,17 @@ OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       return;
     }
 
+  if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      oappend (names_tmm[reg]);
+      return;
+    }
+
   switch (vex.length)
     {
     case 128:
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 7230f87344..3334155071 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -297,6 +297,12 @@ static initializer cpu_flag_init[] =
     "CpuWAITPKG" },
   { "CPU_CLDEMOTE_FLAGS",
     "CpuCLDEMOTE" },
+  { "CPU_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_MOVDIR64B_FLAGS",
@@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =
     "CpuAVX512_BITALG" },
   { "CPU_ANY_AVX512_BF16_FLAGS",
     "CpuAVX512_BF16" },
+  { "CPU_ANY_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_ANY_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_ANY_AMX_TILE_FLAGS",
+    "CpuAMX_TILE|CpuAMX_INT8|CpuAMX_BF16" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -459,6 +471,8 @@ static initializer operand_type_init[] =
     "Class=RegSIMD|Ymmword" },
   { "OPERAND_TYPE_REGZMM",
     "Class=RegSIMD|Zmmword" },
+  { "OPERAND_TYPE_REGTMM",
+    "Class=RegSIMD|Tmmword" },
   { "OPERAND_TYPE_REGMASK",
     "Class=RegMask" },
   { "OPERAND_TYPE_REGBND",
@@ -611,6 +625,9 @@ static bitfield cpu_flags[] =
   BITFIELD (CpuPCONFIG),
   BITFIELD (CpuWAITPKG),
   BITFIELD (CpuCLDEMOTE),
+  BITFIELD (CpuAMX_INT8),
+  BITFIELD (CpuAMX_BF16),
+  BITFIELD (CpuAMX_TILE),
   BITFIELD (CpuMOVDIRI),
   BITFIELD (CpuMOVDIR64B),
   BITFIELD (CpuENQCMD),
@@ -741,6 +758,7 @@ static bitfield operand_types[] =
   BITFIELD (Xmmword),
   BITFIELD (Ymmword),
   BITFIELD (Zmmword),
+  BITFIELD (Tmmword),
   BITFIELD (Unspecified),
 #ifdef OTUnused
   BITFIELD (OTUnused),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c65febbe81..b8a6dfc25c 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -223,6 +223,12 @@ enum
   /* CET instructions support required */
   CpuIBT,
   CpuSHSTK,
+  /* AMX-INT8 instructions required */
+  CpuAMX_INT8,
+  /* AMX-BF16 instructions required */
+  CpuAMX_BF16,
+  /* AMX-TILE instructions required */
+  CpuAMX_TILE,
   /* GFNI instructions required */
   CpuGFNI,
   /* VAES instructions required */
@@ -372,6 +378,9 @@ typedef union i386_cpu_flags
       unsigned int cpuptwrite:1;
       unsigned int cpuibt:1;
       unsigned int cpushstk:1;
+      unsigned int cpuamx_int8:1;
+      unsigned int cpuamx_bf16:1;
+      unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
       unsigned int cpuvpclmulqdq:1;
@@ -574,7 +583,9 @@ enum
 #define VECSIB128	1
 #define VECSIB256	2
 #define VECSIB512	3
+#define SIBMEM		4
   SIB,
+
   /* SSE to AVX support required */
   SSE2AVX,
   /* No AVX equivalent */
@@ -702,7 +713,7 @@ typedef struct i386_opcode_modifier
   unsigned int vexw:2;
   unsigned int vexopcode:3;
   unsigned int vexsources:2;
-  unsigned int sib:2;
+  unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
   unsigned int evex:3;
@@ -807,6 +818,8 @@ enum
   Ymmword,
   /* ZMMWORD size.  */
   Zmmword,
+  /* TMMWORD size.  */
+  Tmmword,
   /* Unspecified memory size.  */
   Unspecified,
 
@@ -851,6 +864,7 @@ typedef union i386_operand_type
       unsigned int xmmword:1;
       unsigned int ymmword:1;
       unsigned int zmmword:1;
+      unsigned int tmmword:1;
       unsigned int unspecified:1;
 #ifdef OTUnused
       unsigned int unused:(OTNumOfBits - OTUnused);
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index cd6833c5ae..2a8ec52b41 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -52,6 +52,7 @@
 #define RegXMM Class=RegSIMD|Xmmword
 #define RegYMM Class=RegSIMD|Ymmword
 #define RegZMM Class=RegSIMD|Zmmword
+#define RegTMM Class=RegSIMD|Tmmword
 
 #define RegMask Class=RegMask
 
@@ -88,6 +89,7 @@
 #define VecSIB128 SIB=VECSIB128
 #define VecSIB256 SIB=VECSIB256
 #define VecSIB512 SIB=VECSIB512
+#define Sibmem SIB=SIBMEM|Modrm
 
 #define EVex128 EVex=EVEX128
 #define EVex256 EVex=EVEX256
@@ -4093,3 +4095,24 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
 xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
 
 // TSXLDTRK instructions end.
+
+// AMX instructions.
+
+ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+
+tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+
+tileloadd, 2, 0xf24b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 2, 0x664b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 2, 0xf34b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
+
+tilerelease, 0, 0x49c0, None, 2, CpuAMX_TILE|Cpu64, Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
+
+tilezero, 1, 0xf249, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM }
+
+// AMX instructions end.
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index cdff763ca7..ca7eeba488 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval
 zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval
 zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval
 zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval
+// TMM registers for AMX
+tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval
+tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval
+tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval
+tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval
+tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval
+tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval
+tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval
+tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval
 // Bound registers for MPX
 bnd0, Class=RegBND, 0, 0, Dw2Inval, Dw2Inval
 bnd1, Class=RegBND, 0, 1, Dw2Inval, Dw2Inval
-- 

Thanks,
Lili.
Jan Beulich July 7, 2020, 9:53 a.m. | #32
On 07.07.2020 10:19, Cui, Lili wrote:
>>> +  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */  {

>>> +    { Bad_Opcode },

>>> +    { "tdpbf16ps", { XMT, EXtmm, Vextmm }, 0 },

> 

>> Along the lines of the above, EXt then (paralleling EXx)? For the last operand here I'd suggest VexTmm or VexTMM.

> 

> EXtmm paralleling with EXxmm and EXymm,


Ah, I see.

>  I took all your suggestions and revised AMX patch.


Most afaics, but not all.

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s

> @@ -0,0 +1,28 @@

> +.text

> +	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).

> +	.byte 0xc4

> +	.byte 0xe2

> +	.byte 0xd2

> +	.byte 0x5c

> +	.byte 0xdc

> +	.fill 0x05, 0x01, 0x90

> +	#tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.L = 1 (illegal value).

> +	.byte 0xc4

> +	.byte 0xe2

> +	.byte 0x56

> +	.byte 0x5c

> +	.byte 0xdc

> +	.fill 0x05, 0x01, 0x90

> +	#tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.R = 0 (illegal value).

> +	.byte 0xc4

> +	.byte 0x62

> +	.byte 0x52

> +	.byte 0x5c

> +	.byte 0xdc

> +	#tdpbf16ps %tmm3,%tmm2,%tmm1 set VEX.B = 0 (illegal value).

> +	.byte 0xc4

> +	.byte 0xc2

> +	.byte 0x52

> +	.byte 0x5c

> +	.byte 0xdc


What about the high bit of VEX.VVVV being zero?

What about the case of there not being a SIB byte?

What about the case of any two operands being the same, which I
think the assembler also still doesn't error on, as one can see
...

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d

> @@ -0,0 +1,76 @@

> +#as:

> +#objdump: -d -Mintel

> +#name: x86_64 AMX insns in Intel syntax

> +#source: x86-64-amx.s

> +

> +.*: +file format .*

> +

> +

> +Disassembly of section \.text:

> +

> +0+ <_start>:

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd tmm0,tmm1,tmm1

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd tmm1,tmm0,tmm1

> +[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd tmm1,tmm1,tmm0


... here (in my earlier reply I had specifically given the comment
in the context of the "inval" test).

> @@ -6830,6 +6932,78 @@ static const struct dis386 x86_64_table[][2] = {

>      { "lidt{Q|Q}", { M }, 0 },

>      { "lidt", { M }, 0 },

>    },

> +

> +  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "ldtilecfg", { M }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tilerelease", { Skip_MODRM }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "sttilecfg", { M }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tilezero", { TMM, Skip_MODRM }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F384B_P_1_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tilestored", { M, TMM }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F384B_P_2_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tileloaddt1", { TMM, M }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F384B_P_3_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tileloadd", { TMM, M }, 0 },

> +  },

> +

> +  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tdpbf16ps", { TMM, EXtmm, VexTmm }, 0 },

> +  },


As a minor remark - from here to ...

> +  /* X86_64_VEX_0F385E_P_0_W_0_M_0_L_0 */

> +  {

> +    { Bad_Opcode },

> +    { "tdpbuud", {TMM, EXtmm, VexTmm}, 0 },

> +  },


... here (and further down) you started losing blanks inside
the inner braces.

Jan
Alan Modra via Binutils July 8, 2020, 8:49 a.m. | #33
> What about the high bit of VEX.VVVV being zero?

> 

> What about the case of there not being a SIB byte?

> 

> What about the case of any two operands being the same, which I think the

> assembler also still doesn't error on, as one can see ...

>

I added them this time.

> tmm3,tmm4,tmm5

> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3

> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c1[ 	]*tdpbssd tmm0,tmm1,tmm1

> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 73 5e c8[ 	]*tdpbssd tmm1,tmm0,tmm1

> > +[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 5e c9[ 	]*tdpbssd tmm1,tmm1,tmm0

> 

> ... here (in my earlier reply I had specifically given the comment in the

> context of the "inval" test).


Sorry, I misunderstood it before.  Below is my updated patch, thanks.

Subject: [PATCH] x86: Add support for Intel AMX instructions

gas/
	* doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.
	* config/tc-i386.c (i386_error): Add invalid_sib_address.
	(cpu_arch): Add .amx_int8, .amx_bf16 and .amx_tile.
	(cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.
	(match_simd_size): Add tmmword check.
	(operand_type_match): Add tmmword.
	(type_names): Add rTMM.
	(i386_error): Add invalid_tmm_register_set.
	(check_VecOperands): Handle invalid_sib_address and
	invalid_tmm_register_set.
	(match_template): Handle invalid_sib_address.
	(build_modrm_byte): Handle non-vector SIB and zmmword.
	(i386_index_check): Disallow RegIP for non-vector SIB.
	(check_register): Handle zmmword.
	* testsuite/gas/i386/i386.exp: Add AMX new tests.
	* testsuite/gas/i386/intel-regs.d: Add tmm.
	* testsuite/gas/i386/intel-regs.s: Add tmm.
	* testsuite/gas/i386/x86-64-amx-intel.d: New.
	* testsuite/gas/i386/x86-64-amx-inval.l: New.
	* testsuite/gas/i386/x86-64-amx-inval.s: New.
	* testsuite/gas/i386/x86-64-amx.d: New.
	* testsuite/gas/i386/x86-64-amx.s: New.
	* testsuite/gas/i386/x86-64-amx-bad.d: New.
	* testsuite/gas/i386/x86-64-amx-bad.s: New.

opcodes/
	* i386-dis.c (TMM): New.
	(EXtmm): Likewise.
	(VexTmm): Likewise.
	(MVexSIBMEM): Likewise.
	(vex_sibmem_mode): Likewise.
	(tmm_mode): Likewise.
	(REG_VEX_0F3849_P_0_W_0_M_1): Likewise.
	(MOD_VEX_0F3849_P_0_W_0): Likewise.
	(MOD_VEX_0F3849_P_2_W_0): Likewise.
	(MOD_VEX_0F3849_P_3_W_0): Likewise.
	(MOD_VEX_0F384B_P_1_W_0): Likewise.
	(MOD_VEX_0F384B_P_2_W_0): Likewise.
	(MOD_VEX_0F384B_P_3_W_0): Likewise.
	(MOD_VEX_0F385C_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_0_W_0): Likewise.
	(MOD_VEX_0F385E_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_2_W_0): Likewise.
	(MOD_VEX_0F385E_P_3_W_0): Likewise.
	(RM_VEX_0F3849_P_0_W_0_M_1_R_0): Likewise.
	(PREFIX_VEX_0F3849): Likewise.
	(PREFIX_VEX_0F384B): Likewise.
	(PREFIX_VEX_0F385C): Likewise.
	(PREFIX_VEX_0F385E): Likewise.
	(X86_64_0F01_REG_3): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385C_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_3_W_0_M_0_L_0): Likewise.
	(VEX_W_0F3849_P_0): Likewise.
	(VEX_W_0F3849_P_2): Likewise.
	(VEX_W_0F3849_P_3): Likewise.
	(VEX_W_0F384B_P_1): Likewise.
	(VEX_W_0F384B_P_2): Likewise.
	(VEX_W_0F384B_P_3): Likewise.
	(VEX_W_0F385C_P_1): Likewise.
	(VEX_W_0F385E_P_0): Likewise.
	(VEX_W_0F385E_P_1): Likewise.
	(VEX_W_0F385E_P_2): Likewise.
	(VEX_W_0F385E_P_3): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0): Likewise.
	(VEX_LEN_0F3849_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F385C_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_3_W_0_M_0): Likewise.
	(names_tmm): Likewise.
	(att_names_tmm): Likewise.
	(intel_operand_size): Handle void_mode.
	(OP_XMM): Handle tmm_mode.
	(OP_EX): Likewise.
	(OP_VEX): Likewise.
	* i386-gen.c (cpu_flag_init): Add entries for
	CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(operand_type_shorthands): Add RegTMM.
	(operand_type_init): Likewise.
	(operand_types): Add Tmmword.
	(cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	* i386-opc.h (CpuAMX_INT8): New.
	(CpuAMX_BF16): Likewise.
	(CpuAMX_TILE): Likewise.
	(SIBMEM): Likewise.
	(Tmmword): Likewise.
	(i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.
	(i386_opcode_modifier): Extend width of fields vexvvvv and sib.
	(i386_operand_type): Add tmmword.
	* i386-opc.tbl: Add AMX instructions.
	* i386-reg.tbl: Add AMX registers.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                      |  97 +++++-
 gas/doc/c-i386.texi                       |   7 +
 gas/testsuite/gas/i386/i386.exp           |   4 +
 gas/testsuite/gas/i386/intel-regs.d       |   4 +
 gas/testsuite/gas/i386/intel-regs.s       |   4 +
 gas/testsuite/gas/i386/x86-64-amx-bad.d   |  20 ++
 gas/testsuite/gas/i386/x86-64-amx-bad.s   |  40 +++
 gas/testsuite/gas/i386/x86-64-amx-intel.d |  70 ++++
 gas/testsuite/gas/i386/x86-64-amx-inval.l |  17 +
 gas/testsuite/gas/i386/x86-64-amx-inval.s |  22 ++
 gas/testsuite/gas/i386/x86-64-amx.d       |  70 ++++
 gas/testsuite/gas/i386/x86-64-amx.s       |  61 ++++
 opcodes/i386-dis.c                        | 390 +++++++++++++++++++++-
 opcodes/i386-gen.c                        |  18 +
 opcodes/i386-opc.h                        |  16 +-
 opcodes/i386-opc.tbl                      |  23 ++
 opcodes/i386-reg.tbl                      |   9 +
 17 files changed, 853 insertions(+), 19 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 2e0eb24753..96f9d2a926 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -290,8 +290,10 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
+    invalid_tmm_register_set,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -372,6 +374,9 @@ struct _i386_insn
     /* Has ZMM register operands.  */
     bfd_boolean has_regzmm;
 
+    /* Has TMM register operands.  */
+    bfd_boolean has_regtmm;
+
     /* Has GOTPC or TLS relocation.  */
     bfd_boolean has_gotpc_tls_reloc;
 
@@ -1201,6 +1206,12 @@ static const arch_entry cpu_arch[] =
     CPU_WAITPKG_FLAGS, 0 },
   { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,
     CPU_CLDEMOTE_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_int8"), PROCESSOR_UNKNOWN,
+    CPU_AMX_INT8_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_bf16"), PROCESSOR_UNKNOWN,
+    CPU_AMX_BF16_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_tile"), PROCESSOR_UNKNOWN,
+    CPU_AMX_TILE_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdiri"), PROCESSOR_UNKNOWN,
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
@@ -1259,6 +1270,9 @@ static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },
   { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
+  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },
+  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },
+  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
   { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
@@ -2159,7 +2173,9 @@ match_simd_size (const insn_template *t, unsigned int wanted,
 	   || (i.types[given].bitfield.ymmword
 	       && !t->operand_types[wanted].bitfield.ymmword)
 	   || (i.types[given].bitfield.zmmword
-	       && !t->operand_types[wanted].bitfield.zmmword));
+	       && !t->operand_types[wanted].bitfield.zmmword)
+	   || (i.types[given].bitfield.tmmword
+	       && !t->operand_types[wanted].bitfield.tmmword));
 }
 
 /* Return 1 if there is no conflict in any size between operand GIVEN
@@ -2296,6 +2312,7 @@ operand_type_match (i386_operand_type overlap,
   temp.bitfield.xmmword = 0;
   temp.bitfield.ymmword = 0;
   temp.bitfield.zmmword = 0;
+  temp.bitfield.tmmword = 0;
   if (operand_type_all_zero (&temp))
     goto mismatch;
 
@@ -3304,6 +3321,7 @@ const type_names[] =
   { OPERAND_TYPE_REGXMM, "rXMM" },
   { OPERAND_TYPE_REGYMM, "rYMM" },
   { OPERAND_TYPE_REGZMM, "rZMM" },
+  { OPERAND_TYPE_REGTMM, "rTMM" },
   { OPERAND_TYPE_REGMASK, "Mask reg" },
 };
 
@@ -5790,7 +5808,7 @@ check_VecOperands (const insn_template *t)
 
   /* For VSIB byte, we need a vector register for index, and all vector
      registers must be distinct.  */
-  if (t->opcode_modifier.sib)
+  if (t->opcode_modifier.sib && t->opcode_modifier.sib != SIBMEM)
     {
       if (!i.index_reg
 	  || !((t->opcode_modifier.sib == VECSIB128
@@ -5849,6 +5867,23 @@ check_VecOperands (const insn_template *t)
 	}
     }
 
+  /* For AMX instructions with three tmmword operands, all tmmword operand must be
+     distinct */
+  if (t->operand_types[0].bitfield.tmmword
+      && i.reg_operands == 3)
+    {
+      if (register_number (i.op[0].regs)
+          == register_number (i.op[1].regs)
+          || register_number (i.op[0].regs)
+             == register_number (i.op[2].regs)
+          || register_number (i.op[1].regs)
+             == register_number (i.op[2].regs))
+	{
+	  i.error = invalid_tmm_register_set;
+	  return 1;
+	}
+    }
+
   /* Check if broadcast is supported by the instruction and is applied
      to the memory operand.  */
   if (i.broadcast)
@@ -6584,12 +6619,18 @@ match_template (char mnem_suffix)
 	  as_bad (_("unsupported instruction `%s'"),
 		  current_templates->start->name);
 	  return NULL;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
 	case invalid_vsib_address:
 	  err_msg = _("invalid VSIB address");
 	  break;
 	case invalid_vector_register_set:
 	  err_msg = _("mask, index, and destination registers must be distinct");
 	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("tmm register must be distinct");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -7923,8 +7964,11 @@ build_modrm_byte (void)
 	  else if (i.op[dest].regs->reg_type.bitfield.class == RegSIMD
 		   || i.op[source].regs->reg_type.bitfield.class == RegSIMD)
 	    {
-	      if (i.types[dest].bitfield.zmmword
-		  || i.types[source].bitfield.zmmword)
+	      if (i.types[dest].bitfield.tmmword
+		  || i.types[source].bitfield.tmmword)
+		i.has_regtmm = TRUE;
+	      else if (i.types[dest].bitfield.zmmword
+		       || i.types[source].bitfield.zmmword)
 		i.has_regzmm = TRUE;
 	      else if (i.types[dest].bitfield.ymmword
 		       || i.types[source].bitfield.ymmword)
@@ -7966,7 +8010,9 @@ build_modrm_byte (void)
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
-	      if (i.index_reg->reg_num == RegIZ)
+	      /* The index register of VSIB shouldn't be RegIZ.  */
+	      if (i.tm.opcode_modifier.sib != SIBMEM
+		  && i.index_reg->reg_num == RegIZ)
 		abort ();
 
 	      i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
@@ -7989,8 +8035,19 @@ build_modrm_byte (void)
 		      i.types[op].bitfield.disp32s = 1;
 		    }
 		}
-	      i.sib.index = i.index_reg->reg_num;
-	      set_rex_vrex (i.index_reg, REX_X, FALSE);
+
+	      /* Since the mandatory SIB always has index register, so
+		 the code logic remains unchanged. The non-mandatory SIB
+		 without index register is allowed and will be handled
+		 later.  */
+	      if (i.index_reg)
+		{
+		  if (i.index_reg->reg_num == RegIZ)
+		    i.sib.index = NO_INDEX_REGISTER;
+		  else
+		    i.sib.index = i.index_reg->reg_num;
+		  set_rex_vrex (i.index_reg, REX_X, FALSE);
+		}
 	    }
 
 	  default_seg = &ds;
@@ -8004,7 +8061,9 @@ build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Both check for VSIB and mandatory non-vector SIB. */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
 		  /* Operand is just <disp>  */
 		  if (flag_code == CODE_64BIT)
 		    {
@@ -8142,7 +8201,11 @@ build_modrm_byte (void)
 	      i.sib.scale = i.log2_scale_factor;
 	      if (i.index_reg == 0)
 		{
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB. */
+		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128
+			      && i.tm.opcode_modifier.sib != VECSIB256
+			      && i.tm.opcode_modifier.sib != VECSIB512);
+
 		  /* <disp>(%esp) becomes two byte modrm with no index
 		     register.  We've already stored the code for esp
 		     in i.rm.regmem ie. ESCAPE_TO_TWO_BYTE_ADDRESSING.
@@ -8267,7 +8330,9 @@ build_modrm_byte (void)
 		break;
 	      if (i.types[op].bitfield.class == RegSIMD)
 		{
-		  if (i.types[op].bitfield.zmmword)
+		  if (i.types[op].bitfield.tmmword)
+		    i.has_regtmm = TRUE;
+		  else if (i.types[op].bitfield.zmmword)
 		    i.has_regzmm = TRUE;
 		  else if (i.types[op].bitfield.ymmword)
 		    i.has_regymm = TRUE;
@@ -10926,9 +10991,10 @@ i386_index_check (const char *operand_string)
 		      || !i.index_reg->reg_type.bitfield.baseindex)))
 	    goto bad_address;
 
-	  /* bndmk, bndldx, and bndstx have special restrictions. */
+	  /* bndmk, bndldx, bndstx and mandatory non-vector SIB have special restrictions. */
 	  if (current_templates->start->base_opcode == 0xf30f1b
-	      || (current_templates->start->base_opcode & ~1) == 0x0f1a)
+	      || (current_templates->start->base_opcode & ~1) == 0x0f1a
+	      || current_templates->start->opcode_modifier.sib == SIBMEM)
 	    {
 	      /* They cannot use RIP-relative addressing. */
 	      if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -10938,7 +11004,7 @@ i386_index_check (const char *operand_string)
 		}
 
 	      /* bndldx and bndstx ignore their scale factor. */
-	      if (current_templates->start->base_opcode != 0xf30f1b
+	      if ((current_templates->start->base_opcode & ~1) == 0x0f1a
 		  && i.log2_scale_factor)
 		as_warn (_("register scaling is being ignored here"));
 	    }
@@ -12440,6 +12506,11 @@ static bfd_boolean check_register (const reg_entry *r)
 	}
     }
 
+  if (r->reg_type.bitfield.tmmword
+      && (!cpu_arch_flags.bitfield.cpuamx_tile
+          || flag_code != CODE_64BIT))
+    return FALSE;
+
   if (r->reg_type.bitfield.class == RegBND && !cpu_arch_flags.bitfield.cpumpx)
     return FALSE;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index d4e6fcb698..cb86cc7968 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
+@code{amx_int8},
+@code{noamx_int8},
+@code{amx_bf16},
+@code{noamx_bf16},
+@code{amx_tile},
+@code{noamx_tile},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1504,6 +1510,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
+@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 55929d3acb..bd4adb07ef 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -1137,6 +1137,10 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"
+    run_list_test "x86-64-amx-inval"
+    run_dump_test "x86-64-amx"
+    run_dump_test "x86-64-amx-intel"
+    run_dump_test "x86-64-amx-bad"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/intel-regs.d b/gas/testsuite/gas/i386/intel-regs.d
index 65bcb6ca7d..480b291c91 100644
--- a/gas/testsuite/gas/i386/intel-regs.d
+++ b/gas/testsuite/gas/i386/intel-regs.d
@@ -6,6 +6,7 @@
 
 Disassembly of section \.text:
 0+0 <.*>:
+.*[ 	]+R_386_32[ 	]+tmm1
 .*[ 	]+R_386_16[ 	]+eax
 .*[ 	]+R_386_16[ 	]+rax
 .*[ 	]+R_386_16[ 	]+axl
@@ -53,4 +54,7 @@ Disassembly of section \.text:
 
 .* <ymm8>:
 .*[ 	]+<ymm8>
+
+.* <tmm0>:
+.*[ 	]+<tmm0>
 #pass
diff --git a/gas/testsuite/gas/i386/intel-regs.s b/gas/testsuite/gas/i386/intel-regs.s
index 66ab16dfc5..44e369bb0f 100644
--- a/gas/testsuite/gas/i386/intel-regs.s
+++ b/gas/testsuite/gas/i386/intel-regs.s
@@ -1,6 +1,8 @@
 	.text
 	.intel_syntax noprefix
 
+	mov	eax, tmm1
+
 	.arch i286
 	.code16
 	mov	ax, eax			; add	[bx+si], al
@@ -59,3 +61,5 @@
 	mov	rax, r8
 ymm8:
 	jmp	ymm8
+tmm0:
+	jmp	tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.d b/gas/testsuite/gas/i386/x86-64-amx-bad.d
new file mode 100644
index 0000000000..2957b6a15b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.d
@@ -0,0 +1,20 @@
+#as:
+#objdump: -drw
+#name: x86_64 AMX insns
+#source: x86-64-amx-bad.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <\.text>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 d2 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 56 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 62 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 c2 52 5c dc[ 	]*tdpbf16ps %tmm5,\(bad\),%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 32 5c dc[ 	]*tdpbf16ps \(bad\),%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 09[ 	]*tileloadd \(bad\),%tmm1
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.s b/gas/testsuite/gas/i386/x86-64-amx-bad.s
new file mode 100644
index 0000000000..f0db1a9493
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s
@@ -0,0 +1,40 @@
+.text
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0xd2
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.L = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x56
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.R = 0 (illegal value).
+	.byte 0xc4
+	.byte 0x62
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).
+	.byte 0xc4
+	.byte 0xc2
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x32
+	.byte 0x5c
+	.byte 0xdc
+	#tileloadd (%rax),%tmm1 set R/M= 001 (illegal value) without SIB.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x7b
+	.byte 0x4b
+	.byte 0x09
+
diff --git a/gas/testsuite/gas/i386/x86-64-amx-intel.d b/gas/testsuite/gas/i386/x86-64-amx-intel.d
new file mode 100644
index 0000000000..fc5e0745ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d
@@ -0,0 +1,70 @@
+#as:
+#objdump: -d -Mintel
+#name: x86_64 AMX insns in Intel syntax
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.l b/gas/testsuite/gas/i386/x86-64-amx-inval.l
new file mode 100644
index 0000000000..e7c284fd71
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.l
@@ -0,0 +1,17 @@
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: operand size mismatch for `tdpbssd'
+.*:9: Error: operand size mismatch for `vaddps'
+.*:10: Error: tmm register must be distinct for `tdpbssd'
+.*:11: Error: tmm register must be distinct for `tdpbssd'
+.*:12: Error: tmm register must be distinct for `tdpbssd'
+.*:15: Error: `\[rip\]' cannot be used here
+.*:16: Error: `\[rip\]' cannot be used here
+.*:17: Error: `\[rip\]' cannot be used here
+.*:18: Error: operand size mismatch for `tdpbssd'
+.*:19: Error: operand size mismatch for `vaddps'
+.*:20: Error: tmm register must be distinct for `tdpbssd'
+.*:21: Error: tmm register must be distinct for `tdpbssd'
+.*:22: Error: tmm register must be distinct for `tdpbssd'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.s b/gas/testsuite/gas/i386/x86-64-amx-inval.s
new file mode 100644
index 0000000000..6e29453669
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.s
@@ -0,0 +1,22 @@
+# Check illegal SIBMEM and register size used in AMX instructions
+
+    .text
+_start:
+    tileloadd (%rip), %tmm1
+    tileloaddt1 (%rip), %tmm1
+    tilestored %tmm1, (%rip)
+    tdpbssd %xmm1, %xmm2, %xmm3
+    vaddps %tmm1, %tmm2, %tmm3
+    tdpbssd %tmm1, %tmm1, %tmm0
+    tdpbssd %tmm1, %tmm0, %tmm1
+    tdpbssd %tmm0, %tmm1, %tmm1
+
+    .intel_syntax noprefix
+    tileloadd tmm1, [rip]
+    tileloaddt1 tmm1, [rip]
+    tilestored [rip], tmm1
+    tdpbssd xmm3, xmm2, xmm1
+    vaddps %tmm1, %tmm2, %tmm3
+    tdpbssd tmm0, tmm1, tmm1
+    tdpbssd tmm1, tmm0, tmm1
+    tdpbssd tmm1, tmm1, tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx.d b/gas/testsuite/gas/i386/x86-64-amx.d
new file mode 100644
index 0000000000..ad6f42240b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.d
@@ -0,0 +1,70 @@
+#as:
+#objdump: -d
+#name: x86_64 AMX insns
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx.s b/gas/testsuite/gas/i386/x86-64-amx.s
new file mode 100644
index 0000000000..c70543152b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.s
@@ -0,0 +1,61 @@
+
+  .allow_index_reg
+  .text
+_start:
+  ldtilecfg  (%rcx,%rdx,2)
+  sttilecfg  (%rcx,%rdx,2)
+  tdpbf16ps %tmm5, %tmm4, %tmm3
+  tdpbssd %tmm3, %tmm2, %tmm1
+  tdpbsud %tmm3, %tmm2, %tmm1
+  tdpbusd %tmm3, %tmm2, %tmm1
+  tdpbuud %tmm3, %tmm2, %tmm1
+  tileloadd foo, %tmm5
+  tileloadd (%rcx), %tmm5
+  tileloadd (%ecx), %tmm5
+  tileloadd (%rcx,%rdx,1), %tmm5
+  tileloadd (%ecx,%edx,2), %tmm1
+  tileloaddt1 foo, %tmm5
+  tileloaddt1 (%rcx), %tmm5
+  tileloaddt1 (%ecx), %tmm5
+  tileloaddt1 (%rcx,%rdx,1), %tmm5
+  tileloaddt1 (%ecx,%edx,2), %tmm1
+  tileloaddt1 (%rcx,%riz,2), %tmm1
+  tilerelease
+  tilestored %tmm5, (%rcx)
+  tilestored %tmm5, (%ecx)
+  tilestored %tmm5, (%rcx,%rdx,1)
+  tilestored %tmm1, (%ecx,%edx,2)
+  tilezero %tmm0
+  tilezero %tmm5
+  tilezero %tmm7
+
+
+  .intel_syntax noprefix
+  ldtilecfg  [rcx]
+  ldtilecfg  [rbx]
+  sttilecfg  [rcx]
+  sttilecfg  [rbx]
+  tdpbf16ps tmm3, tmm4, tmm5
+  tdpbssd tmm1, tmm2, tmm3
+  tdpbsud tmm1, tmm2, tmm3
+  tdpbusd tmm1, tmm2, tmm3
+  tdpbuud tmm1, tmm2, tmm3
+  tileloadd tmm5, foo
+  tileloadd tmm5, [rcx]
+  tileloadd tmm5, [ecx]
+  tileloadd tmm5, [rcx+rdx]
+  tileloadd tmm1, [ecx+edx*2]
+  tileloaddt1 tmm5, foo
+  tileloaddt1 tmm5, [rcx]
+  tileloaddt1 tmm5, [ecx]
+  tileloaddt1 tmm5, [rcx+rdx]
+  tileloaddt1 tmm1, [ecx+edx*2]
+  tileloaddt1 tmm1, [rcx+riz*2]
+  tilerelease
+  tilestored [rcx], tmm5
+  tilestored [ecx], tmm5
+  tilestored [rcx+rdx], tmm5
+  tilestored [ecx+edx*2], tmm1
+  tilezero tmm0
+  tilezero tmm5
+  tilezero tmm7
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 956e2c3539..2b4ad3cd4e 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -375,6 +375,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define XMScalar { OP_XMM, scalar_mode }
 #define XMGatherQ { OP_XMM, vex_vsib_q_w_dq_mode }
 #define XMM { OP_XMM, xmm_mode }
+#define TMM { OP_XMM, tmm_mode }
 #define XMxmmq { OP_XMM, xmmq_mode }
 #define EM { OP_EM, v_mode }
 #define EMS { OP_EM, v_swap_mode }
@@ -391,6 +392,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
+#define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
@@ -421,6 +423,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Vex128 { OP_VEX, vex128_mode }
 #define Vex256 { OP_VEX, vex256_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexTmm { OP_VEX, tmm_mode }
 #define EXdVexScalarS { OP_EX_Vex, d_scalar_swap_mode }
 #define EXqVexScalarS { OP_EX_Vex, q_scalar_swap_mode }
 #define EXVexW { OP_EX_VexW, x_mode }
@@ -451,6 +454,8 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define MVexVSIBQWpX { OP_M, vex_vsib_q_w_dq_mode }
 #define MVexVSIBQDWpX { OP_M, vex_vsib_q_w_d_mode }
 
+#define MVexSIBMEM { OP_M, vex_sibmem_mode }
+
 /* Used handle "rep" prefix for string instructions.  */
 #define Xbr { REP_Fixup, eSI_reg }
 #define Xvr { REP_Fixup, eSI_reg }
@@ -542,6 +547,8 @@ enum
   ymmq_mode,
   /* 32-byte YMM or 16-byte word operand */
   ymmxmm_mode,
+  /* TMM operand */
+  tmm_mode,
   /* d_mode in 32bit, q_mode in 64bit mode.  */
   m_mode,
   /* pair of v_mode operands */
@@ -595,6 +602,8 @@ enum
   vex_vsib_q_w_dq_mode,
   /* Similar to vex_vsib_q_w_dq_mode, with smaller memory.  */
   vex_vsib_q_w_d_mode,
+  /* mandatory non-vector SIB.  */
+  vex_sibmem_mode,
 
   /* scalar, ignore vector length.  */
   scalar_mode,
@@ -743,6 +752,7 @@ enum
   REG_VEX_0F72,
   REG_VEX_0F73,
   REG_VEX_0FAE,
+  REG_VEX_0F3849_P_0_W_0_M_1,
   REG_VEX_0F38F3,
   REG_XOP_LWPCB,
   REG_XOP_LWP,
@@ -826,6 +836,17 @@ enum
   MOD_0FE7_PREFIX_2,
   MOD_0FF0_PREFIX_3,
   MOD_0F382A_PREFIX_2,
+  MOD_VEX_0F3849_P_0_W_0,
+  MOD_VEX_0F3849_P_2_W_0,
+  MOD_VEX_0F3849_P_3_W_0,
+  MOD_VEX_0F384B_P_1_W_0,
+  MOD_VEX_0F384B_P_2_W_0,
+  MOD_VEX_0F384B_P_3_W_0,
+  MOD_VEX_0F385C_P_1_W_0,
+  MOD_VEX_0F385E_P_0_W_0,
+  MOD_VEX_0F385E_P_1_W_0,
+  MOD_VEX_0F385E_P_2_W_0,
+  MOD_VEX_0F385E_P_3_W_0,
   MOD_0F38F5_PREFIX_2,
   MOD_0F38F6_PREFIX_0,
   MOD_0F38F8_PREFIX_1,
@@ -963,6 +984,7 @@ enum
   RM_0F1E_P_1_MOD_3_REG_7,
   RM_0FAE_REG_6_MOD_3_P_0,
   RM_0FAE_REG_7_MOD_3,
+  RM_VEX_0F3849_P_0_W_0_M_1_R_0
 };
 
 enum
@@ -1298,9 +1320,13 @@ enum
   PREFIX_VEX_0F3845,
   PREFIX_VEX_0F3846,
   PREFIX_VEX_0F3847,
+  PREFIX_VEX_0F3849,
+  PREFIX_VEX_0F384B,
   PREFIX_VEX_0F3858,
   PREFIX_VEX_0F3859,
   PREFIX_VEX_0F385A,
+  PREFIX_VEX_0F385C,
+  PREFIX_VEX_0F385E,
   PREFIX_VEX_0F3878,
   PREFIX_VEX_0F3879,
   PREFIX_VEX_0F388C,
@@ -1673,7 +1699,19 @@ enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_2,
-  X86_64_0F01_REG_3
+  X86_64_0F01_REG_3,
+  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0,
+  X86_64_VEX_0F3849_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F385C_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_3_W_0_M_0_L_0
 };
 
 enum
@@ -1758,7 +1796,19 @@ enum
   VEX_LEN_0F381A_P_2_M_0,
   VEX_LEN_0F3836_P_2,
   VEX_LEN_0F3841_P_2,
+  VEX_LEN_0F3849_P_0_W_0_M_0,
+  VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0,
+  VEX_LEN_0F3849_P_2_W_0_M_0,
+  VEX_LEN_0F3849_P_3_W_0_M_0,
+  VEX_LEN_0F384B_P_1_W_0_M_0,
+  VEX_LEN_0F384B_P_2_W_0_M_0,
+  VEX_LEN_0F384B_P_3_W_0_M_0,
   VEX_LEN_0F385A_P_2_M_0,
+  VEX_LEN_0F385C_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_0_W_0_M_0,
+  VEX_LEN_0F385E_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_2_W_0_M_0,
+  VEX_LEN_0F385E_P_3_W_0_M_0,
   VEX_LEN_0F38DB_P_2,
   VEX_LEN_0F38F2_P_0,
   VEX_LEN_0F38F3_R_1_P_0,
@@ -1926,9 +1976,20 @@ enum
   VEX_W_0F382F_P_2_M_0,
   VEX_W_0F3836_P_2,
   VEX_W_0F3846_P_2,
+  VEX_W_0F3849_P_0,
+  VEX_W_0F3849_P_2,
+  VEX_W_0F3849_P_3,
+  VEX_W_0F384B_P_1,
+  VEX_W_0F384B_P_2,
+  VEX_W_0F384B_P_3,
   VEX_W_0F3858_P_2,
   VEX_W_0F3859_P_2,
   VEX_W_0F385A_P_2_M_0,
+  VEX_W_0F385C_P_1,
+  VEX_W_0F385E_P_0,
+  VEX_W_0F385E_P_1,
+  VEX_W_0F385E_P_2,
+  VEX_W_0F385E_P_3,
   VEX_W_0F3878_P_2,
   VEX_W_0F3879_P_2,
   VEX_W_0F38CF_P_2,
@@ -3045,6 +3106,16 @@ static const char *att_names_zmm[] = {
   "%zmm28", "%zmm29", "%zmm30", "%zmm31"
 };
 
+static const char **names_tmm;
+static const char *intel_names_tmm[] = {
+  "tmm0", "tmm1", "tmm2", "tmm3",
+  "tmm4", "tmm5", "tmm6", "tmm7"
+};
+static const char *att_names_tmm[] = {
+  "%tmm0", "%tmm1", "%tmm2", "%tmm3",
+  "%tmm4", "%tmm5", "%tmm6", "%tmm7"
+};
+
 static const char **names_mask;
 static const char *intel_names_mask[] = {
   "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7"
@@ -3413,6 +3484,10 @@ static const struct dis386 reg_table[][8] = {
     { MOD_TABLE (MOD_VEX_0FAE_REG_2) },
     { MOD_TABLE (MOD_VEX_0FAE_REG_3) },
   },
+  /* REG_VEX_0F3849_P_0_W_0_M_1 */
+  {
+    { RM_TABLE (RM_VEX_0F3849_P_0_W_0_M_1_R_0) },
+  },
   /* REG_VEX_0F38F3 */
   {
     { Bad_Opcode },
@@ -5794,6 +5869,22 @@ static const struct dis386 prefix_table[][4] = {
     { "vpsllv%LW", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F3849 */
+  {
+    { VEX_W_TABLE (VEX_W_0F3849_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3849_P_2) },
+    { VEX_W_TABLE (VEX_W_0F3849_P_3) },
+  },
+
+  /* PREFIX_VEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384B_P_1) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_2) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_3) },
+  },
+
   /* PREFIX_VEX_0F3858 */
   {
     { Bad_Opcode },
@@ -5815,6 +5906,21 @@ static const struct dis386 prefix_table[][4] = {
     { MOD_TABLE (MOD_VEX_0F385A_PREFIX_2) },
   },
 
+  /* PREFIX_VEX_0F385C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F385C_P_1) },
+    { Bad_Opcode },
+  },
+
+  /* PREFIX_VEX_0F385E */
+  {
+    { VEX_W_TABLE (VEX_W_0F385E_P_0) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_1) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_2) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_3) },
+  },
+
   /* PREFIX_VEX_0F3878 */
   {
     { Bad_Opcode },
@@ -6830,6 +6936,78 @@ static const struct dis386 x86_64_table[][2] = {
     { "lidt{Q|Q}", { M }, 0 },
     { "lidt", { M }, 0 },
   },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "ldtilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilerelease", { Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "sttilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilezero", { TMM, Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilestored", { MVexSIBMEM, TMM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloaddt1", { TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloadd", { TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbf16ps", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbuud", {TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbsud", {TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbusd", {TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbssd", {TMM, EXtmm, VexTmm }, 0 },
+  },
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8671,9 +8849,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3847) },
     /* 48 */
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -8692,9 +8870,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3859) },
     { PREFIX_TABLE (PREFIX_VEX_0F385A) },
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385C) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385E) },
     { Bad_Opcode },
     /* 60 */
     { Bad_Opcode },
@@ -9432,12 +9610,72 @@ static const struct dis386 vex_len_table[][2] = {
     { "vphminposuw",	{ XM, EXx }, 0 },
   },
 
+  /* VEX_LEN_0F3849_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_3_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F385A_P_2_M_0 */
   {
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F385A_P_2_M_0) },
   },
 
+  /* VEX_LEN_0F385C_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385C_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F38DB_P_2 */
   {
     { "vaesimc",	{ XM, EXx }, 0 },
@@ -9930,6 +10168,30 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3846_P_2 */
     { "vpsravd",	{ XM, Vex, EXx }, 0 },
   },
+  {
+    /* VEX_W_0F3849_P_0 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_2 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_3 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_1 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_2 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_3 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3858_P_2 */
     { "vpbroadcastd", { XM, EXxmm_md }, 0 },
@@ -9942,6 +10204,26 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F385A_P_2_M_0 */
     { "vbroadcasti128", { XM, Mxmm }, 0 },
   },
+  {
+    /* VEX_W_0F385C_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385C_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_0 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_2 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_3 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3878_P_2 */
     { "vpbroadcastb",	{ XM, EXxmm_mb }, 0 },
@@ -10388,6 +10670,57 @@ static const struct dis386 mod_table[][2] = {
     /* MOD_0F382A_PREFIX_2 */
     { "movntdqa",	{ XM, Mx }, 0 },
   },
+  {
+    /* MOD_VEX_0F3849_P_0_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_0) },
+    { REG_TABLE (REG_VEX_0F3849_P_0_W_0_M_1) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_1_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_3_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385C_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385C_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_0_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_0_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_2_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_3_W_0_M_0) },
+  },
   {
     /* MOD_0F38F5_PREFIX_2 */
     { "wrussK",		{ M, Gdq }, PREFIX_OPCODE },
@@ -10949,6 +11282,10 @@ static const struct dis386 rm_table[][8] = {
     { "sfence",		{ Skip_MODRM }, 0 },
 
   },
+  {
+    /* RM_VEX_0F3849_P_0_W_0_M_1_R_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0) },
+  },
 };
 
 #define INTERNAL_DISASSEMBLER_ERROR _("<internal disassembler error>")
@@ -11845,6 +12182,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = intel_names_xmm;
       names_ymm = intel_names_ymm;
       names_zmm = intel_names_zmm;
+      names_tmm = intel_names_tmm;
       index64 = intel_index64;
       index32 = intel_index32;
       names_mask = intel_names_mask;
@@ -11867,6 +12205,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = att_names_xmm;
       names_ymm = att_names_ymm;
       names_zmm = att_names_zmm;
+      names_tmm = att_names_tmm;
       index64 = att_index64;
       index32 = att_index32;
       names_mask = att_names_mask;
@@ -14023,6 +14362,15 @@ OP_E_memory (int bytemode, int sizeflag)
 	  base = sib.base;
 	  codep++;
 	}
+      else
+	{
+	  /* mandatory non-vector SIB must have sib */
+	  if (bytemode == vex_sibmem_mode)
+	    {
+	      oappend ("(bad)");
+	      return;
+	    }
+	}
       rbase = base + add;
 
       switch (modrm.mod)
@@ -15050,6 +15398,7 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != scalar_mode)
     {
       switch (vex.length)
@@ -15088,6 +15437,16 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
+
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15212,6 +15571,7 @@ OP_EX (int bytemode, int sizeflag)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != d_scalar_swap_mode
       && bytemode != q_scalar_swap_mode
       && bytemode != vex_scalar_w_dq_mode)
@@ -15247,6 +15607,15 @@ OP_EX (int bytemode, int sizeflag)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      names = names_tmm;
+    }
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15802,6 +16171,17 @@ OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       return;
     }
 
+  if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      oappend (names_tmm[reg]);
+      return;
+    }
+
   switch (vex.length)
     {
     case 128:
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 7230f87344..3334155071 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -297,6 +297,12 @@ static initializer cpu_flag_init[] =
     "CpuWAITPKG" },
   { "CPU_CLDEMOTE_FLAGS",
     "CpuCLDEMOTE" },
+  { "CPU_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_MOVDIR64B_FLAGS",
@@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =
     "CpuAVX512_BITALG" },
   { "CPU_ANY_AVX512_BF16_FLAGS",
     "CpuAVX512_BF16" },
+  { "CPU_ANY_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_ANY_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_ANY_AMX_TILE_FLAGS",
+    "CpuAMX_TILE|CpuAMX_INT8|CpuAMX_BF16" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -459,6 +471,8 @@ static initializer operand_type_init[] =
     "Class=RegSIMD|Ymmword" },
   { "OPERAND_TYPE_REGZMM",
     "Class=RegSIMD|Zmmword" },
+  { "OPERAND_TYPE_REGTMM",
+    "Class=RegSIMD|Tmmword" },
   { "OPERAND_TYPE_REGMASK",
     "Class=RegMask" },
   { "OPERAND_TYPE_REGBND",
@@ -611,6 +625,9 @@ static bitfield cpu_flags[] =
   BITFIELD (CpuPCONFIG),
   BITFIELD (CpuWAITPKG),
   BITFIELD (CpuCLDEMOTE),
+  BITFIELD (CpuAMX_INT8),
+  BITFIELD (CpuAMX_BF16),
+  BITFIELD (CpuAMX_TILE),
   BITFIELD (CpuMOVDIRI),
   BITFIELD (CpuMOVDIR64B),
   BITFIELD (CpuENQCMD),
@@ -741,6 +758,7 @@ static bitfield operand_types[] =
   BITFIELD (Xmmword),
   BITFIELD (Ymmword),
   BITFIELD (Zmmword),
+  BITFIELD (Tmmword),
   BITFIELD (Unspecified),
 #ifdef OTUnused
   BITFIELD (OTUnused),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c65febbe81..b8a6dfc25c 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -223,6 +223,12 @@ enum
   /* CET instructions support required */
   CpuIBT,
   CpuSHSTK,
+  /* AMX-INT8 instructions required */
+  CpuAMX_INT8,
+  /* AMX-BF16 instructions required */
+  CpuAMX_BF16,
+  /* AMX-TILE instructions required */
+  CpuAMX_TILE,
   /* GFNI instructions required */
   CpuGFNI,
   /* VAES instructions required */
@@ -372,6 +378,9 @@ typedef union i386_cpu_flags
       unsigned int cpuptwrite:1;
       unsigned int cpuibt:1;
       unsigned int cpushstk:1;
+      unsigned int cpuamx_int8:1;
+      unsigned int cpuamx_bf16:1;
+      unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
       unsigned int cpuvpclmulqdq:1;
@@ -574,7 +583,9 @@ enum
 #define VECSIB128	1
 #define VECSIB256	2
 #define VECSIB512	3
+#define SIBMEM		4
   SIB,
+
   /* SSE to AVX support required */
   SSE2AVX,
   /* No AVX equivalent */
@@ -702,7 +713,7 @@ typedef struct i386_opcode_modifier
   unsigned int vexw:2;
   unsigned int vexopcode:3;
   unsigned int vexsources:2;
-  unsigned int sib:2;
+  unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
   unsigned int evex:3;
@@ -807,6 +818,8 @@ enum
   Ymmword,
   /* ZMMWORD size.  */
   Zmmword,
+  /* TMMWORD size.  */
+  Tmmword,
   /* Unspecified memory size.  */
   Unspecified,
 
@@ -851,6 +864,7 @@ typedef union i386_operand_type
       unsigned int xmmword:1;
       unsigned int ymmword:1;
       unsigned int zmmword:1;
+      unsigned int tmmword:1;
       unsigned int unspecified:1;
 #ifdef OTUnused
       unsigned int unused:(OTNumOfBits - OTUnused);
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index cd6833c5ae..2a8ec52b41 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -52,6 +52,7 @@
 #define RegXMM Class=RegSIMD|Xmmword
 #define RegYMM Class=RegSIMD|Ymmword
 #define RegZMM Class=RegSIMD|Zmmword
+#define RegTMM Class=RegSIMD|Tmmword
 
 #define RegMask Class=RegMask
 
@@ -88,6 +89,7 @@
 #define VecSIB128 SIB=VECSIB128
 #define VecSIB256 SIB=VECSIB256
 #define VecSIB512 SIB=VECSIB512
+#define Sibmem SIB=SIBMEM|Modrm
 
 #define EVex128 EVex=EVEX128
 #define EVex256 EVex=EVEX256
@@ -4093,3 +4095,24 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
 xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
 
 // TSXLDTRK instructions end.
+
+// AMX instructions.
+
+ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+
+tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+
+tileloadd, 2, 0xf24b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 2, 0x664b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 2, 0xf34b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
+
+tilerelease, 0, 0x49c0, None, 2, CpuAMX_TILE|Cpu64, Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
+
+tilezero, 1, 0xf249, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM }
+
+// AMX instructions end.
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index cdff763ca7..ca7eeba488 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval
 zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval
 zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval
 zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval
+// TMM registers for AMX
+tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval
+tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval
+tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval
+tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval
+tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval
+tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval
+tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval
+tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval
 // Bound registers for MPX
 bnd0, Class=RegBND, 0, 0, Dw2Inval, Dw2Inval
 bnd1, Class=RegBND, 0, 1, Dw2Inval, Dw2Inval
-- 
2.17.1

Thanks,
Lili.
Jan Beulich July 8, 2020, 3:20 p.m. | #34
On 08.07.2020 10:49, Cui, Lili wrote:

Just two more small things, everything else looks good to me now:

> @@ -6584,12 +6619,18 @@ match_template (char mnem_suffix)

>  	  as_bad (_("unsupported instruction `%s'"),

>  		  current_templates->start->name);

>  	  return NULL;

> +	case invalid_sib_address:

> +	  err_msg = _("invalid SIB address");

> +	  break;

>  	case invalid_vsib_address:

>  	  err_msg = _("invalid VSIB address");

>  	  break;

>  	case invalid_vector_register_set:

>  	  err_msg = _("mask, index, and destination registers must be distinct");

>  	  break;

> +	case invalid_tmm_register_set:

> +	  err_msg = _("tmm register must be distinct");


Would you mind making this e.g. "all tmm registers must be distinct"? At
the very least the plural "registers" should be used imo.

> --- /dev/null

> +++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s

> @@ -0,0 +1,40 @@

> +.text

> +	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).

> +	.byte 0xc4

> +	.byte 0xe2

> +	.byte 0xd2

> +	.byte 0x5c

> +	.byte 0xdc

> +	.fill 0x05, 0x01, 0x90

> +	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.L = 1 (illegal value).

> +	.byte 0xc4

> +	.byte 0xe2

> +	.byte 0x56

> +	.byte 0x5c

> +	.byte 0xdc

> +	.fill 0x05, 0x01, 0x90

> +	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.R = 0 (illegal value).

> +	.byte 0xc4

> +	.byte 0x62

> +	.byte 0x52

> +	.byte 0x5c

> +	.byte 0xdc

> +	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).

> +	.byte 0xc4

> +	.byte 0xc2

> +	.byte 0x52

> +	.byte 0x5c

> +	.byte 0xdc

> +	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).

> +	.byte 0xc4

> +	.byte 0xe2

> +	.byte 0x32

> +	.byte 0x5c

> +	.byte 0xdc

> +	#tileloadd (%rax),%tmm1 set R/M= 001 (illegal value) without SIB.

> +	.byte 0xc4

> +	.byte 0xe2

> +	.byte 0x7b

> +	.byte 0x4b

> +	.byte 0x09


Depending on whether the disassembler also properly handles the
gather insn restrictions on register choice (I didn't check and
don't know offhand), I think you also want to verify that what

    tdpbssd %tmm1, %tmm1, %tmm0
    tdpbssd %tmm1, %tmm0, %tmm1
    tdpbssd %tmm0, %tmm1, %tmm1

would assemble to if there wasn't the error you've now added,
doesn't disassemble cleanly. (If there's no similar logic for
the gathers, then I wouldn't insist, but merely consider it a
nice-to-have).

Thanks for your patience with my reviews.

Jan
Alan Modra via Binutils July 9, 2020, 5:24 a.m. | #35
> -----Original Message-----

> From: Jan Beulich <jbeulich@suse.com>

> Sent: Wednesday, July 8, 2020 11:20 PM

> To: Cui, Lili <lili.cui@intel.com>

> Cc: H.J. Lu <hjl.tools@gmail.com>; binutils@sourceware.org

> Subject: Re: x86: Add support for Intel AMX instructions

> 

> On 08.07.2020 10:49, Cui, Lili wrote:

> 

> Just two more small things, everything else looks good to me now:

> 

> > @@ -6584,12 +6619,18 @@ match_template (char mnem_suffix)

> >  	  as_bad (_("unsupported instruction `%s'"),

> >  		  current_templates->start->name);

> >  	  return NULL;

> > +	case invalid_sib_address:

> > +	  err_msg = _("invalid SIB address");

> > +	  break;

> >  	case invalid_vsib_address:

> >  	  err_msg = _("invalid VSIB address");

> >  	  break;

> >  	case invalid_vector_register_set:

> >  	  err_msg = _("mask, index, and destination registers must be

> distinct");

> >  	  break;

> > +	case invalid_tmm_register_set:

> > +	  err_msg = _("tmm register must be distinct");

> 

> Would you mind making this e.g. "all tmm registers must be distinct"? At the

> very least the plural "registers" should be used imo.

> 


Changed it.

> > --- /dev/null

> > +++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s

> > @@ -0,0 +1,40 @@

> > +	#tileloadd (%rax),%tmm1 set R/M= 001 (illegal value) without SIB.

> > +	.byte 0xc4

> > +	.byte 0xe2

> > +	.byte 0x7b

> > +	.byte 0x4b

> > +	.byte 0x09

> 

> Depending on whether the disassembler also properly handles the gather

> insn restrictions on register choice (I didn't check and don't know offhand), I

> think you also want to verify that what

> 

>     tdpbssd %tmm1, %tmm1, %tmm0

>     tdpbssd %tmm1, %tmm0, %tmm1

>     tdpbssd %tmm0, %tmm1, %tmm1

> 

> would assemble to if there wasn't the error you've now added, doesn't

> disassemble cleanly. (If there's no similar logic for the gathers, then I

> wouldn't insist, but merely consider it a nice-to-have).

> 


Added this check in disassembler.

> Thanks for your patience with my reviews.

> 

> Jan

Also thank you for your patience. Here is the updated patch.


Subject: [PATCH] x86: Add support for Intel AMX instructions

gas/
	* doc/c-i386.texi: Document amx_int8, amx_bf16 and amx_tile.
	* config/tc-i386.c (i386_error): Add invalid_sib_address.
	(cpu_arch): Add .amx_int8, .amx_bf16 and .amx_tile.
	(cpu_noarch): Add noamx_int8, noamx_bf16 and noamx_tile.
	(match_simd_size): Add tmmword check.
	(operand_type_match): Add tmmword.
	(type_names): Add rTMM.
	(i386_error): Add invalid_tmm_register_set.
	(check_VecOperands): Handle invalid_sib_address and
	invalid_tmm_register_set.
	(match_template): Handle invalid_sib_address.
	(build_modrm_byte): Handle non-vector SIB and zmmword.
	(i386_index_check): Disallow RegIP for non-vector SIB.
	(check_register): Handle zmmword.
	* testsuite/gas/i386/i386.exp: Add AMX new tests.
	* testsuite/gas/i386/intel-regs.d: Add tmm.
	* testsuite/gas/i386/intel-regs.s: Add tmm.
	* testsuite/gas/i386/x86-64-amx-intel.d: New.
	* testsuite/gas/i386/x86-64-amx-inval.l: New.
	* testsuite/gas/i386/x86-64-amx-inval.s: New.
	* testsuite/gas/i386/x86-64-amx.d: New.
	* testsuite/gas/i386/x86-64-amx.s: New.
	* testsuite/gas/i386/x86-64-amx-bad.d: New.
	* testsuite/gas/i386/x86-64-amx-bad.s: New.

opcodes/
	* i386-dis.c (TMM): New.
	(EXtmm): Likewise.
	(VexTmm): Likewise.
	(tmm_mode): Likewise.
	(MVexSIBMEM): Likewise.
	(vex_sibmem_mode): Likewise.
	(OP_VEX_TMM_Fixup): Likewise.
	(REG_VEX_0F3849_P_0_W_0_M_1): Likewise.
	(MOD_VEX_0F3849_P_0_W_0): Likewise.
	(MOD_VEX_0F3849_P_2_W_0): Likewise.
	(MOD_VEX_0F3849_P_3_W_0): Likewise.
	(MOD_VEX_0F384B_P_1_W_0): Likewise.
	(MOD_VEX_0F384B_P_2_W_0): Likewise.
	(MOD_VEX_0F384B_P_3_W_0): Likewise.
	(MOD_VEX_0F385C_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_0_W_0): Likewise.
	(MOD_VEX_0F385E_P_1_W_0): Likewise.
	(MOD_VEX_0F385E_P_2_W_0): Likewise.
	(MOD_VEX_0F385E_P_3_W_0): Likewise.
	(RM_VEX_0F3849_P_0_W_0_M_1_R_0): Likewise.
	(PREFIX_VEX_0F3849): Likewise.
	(PREFIX_VEX_0F384B): Likewise.
	(PREFIX_VEX_0F385C): Likewise.
	(PREFIX_VEX_0F385E): Likewise.
	(X86_64_0F01_REG_3): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F3849_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F384B_P_3_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385C_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_0_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_1_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_2_W_0_M_0_L_0): Likewise.
	(X86_64_VEX_0F385E_P_3_W_0_M_0_L_0): Likewise.
	(VEX_W_0F3849_P_0): Likewise.
	(VEX_W_0F3849_P_2): Likewise.
	(VEX_W_0F3849_P_3): Likewise.
	(VEX_W_0F384B_P_1): Likewise.
	(VEX_W_0F384B_P_2): Likewise.
	(VEX_W_0F384B_P_3): Likewise.
	(VEX_W_0F385C_P_1): Likewise.
	(VEX_W_0F385E_P_0): Likewise.
	(VEX_W_0F385E_P_1): Likewise.
	(VEX_W_0F385E_P_2): Likewise.
	(VEX_W_0F385E_P_3): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0): Likewise.
	(VEX_LEN_0F3849_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F3849_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F384B_P_3_W_0_M_0): Likewise.
	(VEX_LEN_0F385C_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_0_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_1_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_2_W_0_M_0): Likewise.
	(VEX_LEN_0F385E_P_3_W_0_M_0): Likewise.
	(names_tmm): Likewise.
	(att_names_tmm): Likewise.
	(intel_operand_size): Handle void_mode.
	(OP_XMM): Handle tmm_mode.
	(OP_EX): Likewise.
	(OP_VEX): Likewise.
	* i386-gen.c (cpu_flag_init): Add entries for
	CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(operand_type_shorthands): Add RegTMM.
	(operand_type_init): Likewise.
	(operand_types): Add Tmmword.
	(cpu_flag_init): Add CPU_AMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	(cpu_flags): Add CpuAMX_INT8, CpuAMX_BF16 and CpuAMX_TILE.
	* i386-opc.h (CpuAMX_INT8): New.
	(CpuAMX_BF16): Likewise.
	(CpuAMX_TILE): Likewise.
	(SIBMEM): Likewise.
	(Tmmword): Likewise.
	(i386_cpu_flags): Add cpuamx_int8, cpuamx_bf16 and cpuamx_tile.
	(i386_opcode_modifier): Extend width of fields vexvvvv and sib.
	(i386_operand_type): Add tmmword.
	* i386-opc.tbl: Add AMX instructions.
	* i386-reg.tbl: Add AMX registers.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                      |  97 ++++-
 gas/doc/c-i386.texi                       |   7 +
 gas/testsuite/gas/i386/i386.exp           |   4 +
 gas/testsuite/gas/i386/intel-regs.d       |   4 +
 gas/testsuite/gas/i386/intel-regs.s       |   4 +
 gas/testsuite/gas/i386/x86-64-amx-bad.d   |  24 ++
 gas/testsuite/gas/i386/x86-64-amx-bad.s   |  63 ++++
 gas/testsuite/gas/i386/x86-64-amx-intel.d |  70 ++++
 gas/testsuite/gas/i386/x86-64-amx-inval.l |  17 +
 gas/testsuite/gas/i386/x86-64-amx-inval.s |  22 ++
 gas/testsuite/gas/i386/x86-64-amx.d       |  70 ++++
 gas/testsuite/gas/i386/x86-64-amx.s       |  61 +++
 opcodes/i386-dis.c                        | 439 +++++++++++++++++++++-
 opcodes/i386-gen.c                        |  18 +
 opcodes/i386-opc.h                        |  16 +-
 opcodes/i386-opc.tbl                      |  23 ++
 opcodes/i386-reg.tbl                      |   9 +
 17 files changed, 929 insertions(+), 19 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-bad.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-intel.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-amx.s

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 2e0eb24753..537c0076ca 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -290,8 +290,10 @@ enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
+    invalid_tmm_register_set,
     unsupported_vector_index_register,
     unsupported_broadcast,
     broadcast_needed,
@@ -372,6 +374,9 @@ struct _i386_insn
     /* Has ZMM register operands.  */
     bfd_boolean has_regzmm;
 
+    /* Has TMM register operands.  */
+    bfd_boolean has_regtmm;
+
     /* Has GOTPC or TLS relocation.  */
     bfd_boolean has_gotpc_tls_reloc;
 
@@ -1201,6 +1206,12 @@ static const arch_entry cpu_arch[] =
     CPU_WAITPKG_FLAGS, 0 },
   { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,
     CPU_CLDEMOTE_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_int8"), PROCESSOR_UNKNOWN,
+    CPU_AMX_INT8_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_bf16"), PROCESSOR_UNKNOWN,
+    CPU_AMX_BF16_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx_tile"), PROCESSOR_UNKNOWN,
+    CPU_AMX_TILE_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdiri"), PROCESSOR_UNKNOWN,
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
@@ -1259,6 +1270,9 @@ static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },
   { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
+  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },
+  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },
+  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
   { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
@@ -2159,7 +2173,9 @@ match_simd_size (const insn_template *t, unsigned int wanted,
 	   || (i.types[given].bitfield.ymmword
 	       && !t->operand_types[wanted].bitfield.ymmword)
 	   || (i.types[given].bitfield.zmmword
-	       && !t->operand_types[wanted].bitfield.zmmword));
+	       && !t->operand_types[wanted].bitfield.zmmword)
+	   || (i.types[given].bitfield.tmmword
+	       && !t->operand_types[wanted].bitfield.tmmword));
 }
 
 /* Return 1 if there is no conflict in any size between operand GIVEN
@@ -2296,6 +2312,7 @@ operand_type_match (i386_operand_type overlap,
   temp.bitfield.xmmword = 0;
   temp.bitfield.ymmword = 0;
   temp.bitfield.zmmword = 0;
+  temp.bitfield.tmmword = 0;
   if (operand_type_all_zero (&temp))
     goto mismatch;
 
@@ -3304,6 +3321,7 @@ const type_names[] =
   { OPERAND_TYPE_REGXMM, "rXMM" },
   { OPERAND_TYPE_REGYMM, "rYMM" },
   { OPERAND_TYPE_REGZMM, "rZMM" },
+  { OPERAND_TYPE_REGTMM, "rTMM" },
   { OPERAND_TYPE_REGMASK, "Mask reg" },
 };
 
@@ -5790,7 +5808,7 @@ check_VecOperands (const insn_template *t)
 
   /* For VSIB byte, we need a vector register for index, and all vector
      registers must be distinct.  */
-  if (t->opcode_modifier.sib)
+  if (t->opcode_modifier.sib && t->opcode_modifier.sib != SIBMEM)
     {
       if (!i.index_reg
 	  || !((t->opcode_modifier.sib == VECSIB128
@@ -5849,6 +5867,23 @@ check_VecOperands (const insn_template *t)
 	}
     }
 
+  /* For AMX instructions with three tmmword operands, all tmmword operand must be
+     distinct */
+  if (t->operand_types[0].bitfield.tmmword
+      && i.reg_operands == 3)
+    {
+      if (register_number (i.op[0].regs)
+          == register_number (i.op[1].regs)
+          || register_number (i.op[0].regs)
+             == register_number (i.op[2].regs)
+          || register_number (i.op[1].regs)
+             == register_number (i.op[2].regs))
+	{
+	  i.error = invalid_tmm_register_set;
+	  return 1;
+	}
+    }
+
   /* Check if broadcast is supported by the instruction and is applied
      to the memory operand.  */
   if (i.broadcast)
@@ -6584,12 +6619,18 @@ match_template (char mnem_suffix)
 	  as_bad (_("unsupported instruction `%s'"),
 		  current_templates->start->name);
 	  return NULL;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
 	case invalid_vsib_address:
 	  err_msg = _("invalid VSIB address");
 	  break;
 	case invalid_vector_register_set:
 	  err_msg = _("mask, index, and destination registers must be distinct");
 	  break;
+	case invalid_tmm_register_set:
+	  err_msg = _("all tmm registers must be distinct");
+	  break;
 	case unsupported_vector_index_register:
 	  err_msg = _("unsupported vector index register");
 	  break;
@@ -7923,8 +7964,11 @@ build_modrm_byte (void)
 	  else if (i.op[dest].regs->reg_type.bitfield.class == RegSIMD
 		   || i.op[source].regs->reg_type.bitfield.class == RegSIMD)
 	    {
-	      if (i.types[dest].bitfield.zmmword
-		  || i.types[source].bitfield.zmmword)
+	      if (i.types[dest].bitfield.tmmword
+		  || i.types[source].bitfield.tmmword)
+		i.has_regtmm = TRUE;
+	      else if (i.types[dest].bitfield.zmmword
+		       || i.types[source].bitfield.zmmword)
 		i.has_regzmm = TRUE;
 	      else if (i.types[dest].bitfield.ymmword
 		       || i.types[source].bitfield.ymmword)
@@ -7966,7 +8010,9 @@ build_modrm_byte (void)
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
-	      if (i.index_reg->reg_num == RegIZ)
+	      /* The index register of VSIB shouldn't be RegIZ.  */
+	      if (i.tm.opcode_modifier.sib != SIBMEM
+		  && i.index_reg->reg_num == RegIZ)
 		abort ();
 
 	      i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
@@ -7989,8 +8035,19 @@ build_modrm_byte (void)
 		      i.types[op].bitfield.disp32s = 1;
 		    }
 		}
-	      i.sib.index = i.index_reg->reg_num;
-	      set_rex_vrex (i.index_reg, REX_X, FALSE);
+
+	      /* Since the mandatory SIB always has index register, so
+		 the code logic remains unchanged. The non-mandatory SIB
+		 without index register is allowed and will be handled
+		 later.  */
+	      if (i.index_reg)
+		{
+		  if (i.index_reg->reg_num == RegIZ)
+		    i.sib.index = NO_INDEX_REGISTER;
+		  else
+		    i.sib.index = i.index_reg->reg_num;
+		  set_rex_vrex (i.index_reg, REX_X, FALSE);
+		}
 	    }
 
 	  default_seg = &ds;
@@ -8004,7 +8061,9 @@ build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Both check for VSIB and mandatory non-vector SIB. */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
 		  /* Operand is just <disp>  */
 		  if (flag_code == CODE_64BIT)
 		    {
@@ -8142,7 +8201,11 @@ build_modrm_byte (void)
 	      i.sib.scale = i.log2_scale_factor;
 	      if (i.index_reg == 0)
 		{
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB. */
+		  gas_assert (i.tm.opcode_modifier.sib != VECSIB128
+			      && i.tm.opcode_modifier.sib != VECSIB256
+			      && i.tm.opcode_modifier.sib != VECSIB512);
+
 		  /* <disp>(%esp) becomes two byte modrm with no index
 		     register.  We've already stored the code for esp
 		     in i.rm.regmem ie. ESCAPE_TO_TWO_BYTE_ADDRESSING.
@@ -8267,7 +8330,9 @@ build_modrm_byte (void)
 		break;
 	      if (i.types[op].bitfield.class == RegSIMD)
 		{
-		  if (i.types[op].bitfield.zmmword)
+		  if (i.types[op].bitfield.tmmword)
+		    i.has_regtmm = TRUE;
+		  else if (i.types[op].bitfield.zmmword)
 		    i.has_regzmm = TRUE;
 		  else if (i.types[op].bitfield.ymmword)
 		    i.has_regymm = TRUE;
@@ -10926,9 +10991,10 @@ i386_index_check (const char *operand_string)
 		      || !i.index_reg->reg_type.bitfield.baseindex)))
 	    goto bad_address;
 
-	  /* bndmk, bndldx, and bndstx have special restrictions. */
+	  /* bndmk, bndldx, bndstx and mandatory non-vector SIB have special restrictions. */
 	  if (current_templates->start->base_opcode == 0xf30f1b
-	      || (current_templates->start->base_opcode & ~1) == 0x0f1a)
+	      || (current_templates->start->base_opcode & ~1) == 0x0f1a
+	      || current_templates->start->opcode_modifier.sib == SIBMEM)
 	    {
 	      /* They cannot use RIP-relative addressing. */
 	      if (i.base_reg && i.base_reg->reg_num == RegIP)
@@ -10938,7 +11004,7 @@ i386_index_check (const char *operand_string)
 		}
 
 	      /* bndldx and bndstx ignore their scale factor. */
-	      if (current_templates->start->base_opcode != 0xf30f1b
+	      if ((current_templates->start->base_opcode & ~1) == 0x0f1a
 		  && i.log2_scale_factor)
 		as_warn (_("register scaling is being ignored here"));
 	    }
@@ -12440,6 +12506,11 @@ static bfd_boolean check_register (const reg_entry *r)
 	}
     }
 
+  if (r->reg_type.bitfield.tmmword
+      && (!cpu_arch_flags.bitfield.cpuamx_tile
+          || flag_code != CODE_64BIT))
+    return FALSE;
+
   if (r->reg_type.bitfield.class == RegBND && !cpu_arch_flags.bitfield.cpumpx)
     return FALSE;
 
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index f3183f1ddb..3813f5eb59 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -226,6 +226,12 @@ accept various extension mnemonics.  For example,
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
+@code{amx_int8},
+@code{noamx_int8},
+@code{amx_bf16},
+@code{noamx_bf16},
+@code{amx_tile},
+@code{noamx_tile},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1494,6 +1500,7 @@ supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
+@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 37aa39698c..4feb9e76ca 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -1139,6 +1139,10 @@ if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"
+    run_list_test "x86-64-amx-inval"
+    run_dump_test "x86-64-amx"
+    run_dump_test "x86-64-amx-intel"
+    run_dump_test "x86-64-amx-bad"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/intel-regs.d b/gas/testsuite/gas/i386/intel-regs.d
index 65bcb6ca7d..480b291c91 100644
--- a/gas/testsuite/gas/i386/intel-regs.d
+++ b/gas/testsuite/gas/i386/intel-regs.d
@@ -6,6 +6,7 @@
 
 Disassembly of section \.text:
 0+0 <.*>:
+.*[ 	]+R_386_32[ 	]+tmm1
 .*[ 	]+R_386_16[ 	]+eax
 .*[ 	]+R_386_16[ 	]+rax
 .*[ 	]+R_386_16[ 	]+axl
@@ -53,4 +54,7 @@ Disassembly of section \.text:
 
 .* <ymm8>:
 .*[ 	]+<ymm8>
+
+.* <tmm0>:
+.*[ 	]+<tmm0>
 #pass
diff --git a/gas/testsuite/gas/i386/intel-regs.s b/gas/testsuite/gas/i386/intel-regs.s
index 66ab16dfc5..44e369bb0f 100644
--- a/gas/testsuite/gas/i386/intel-regs.s
+++ b/gas/testsuite/gas/i386/intel-regs.s
@@ -1,6 +1,8 @@
 	.text
 	.intel_syntax noprefix
 
+	mov	eax, tmm1
+
 	.arch i286
 	.code16
 	mov	ax, eax			; add	[bx+si], al
@@ -59,3 +61,5 @@
 	mov	rax, r8
 ymm8:
 	jmp	ymm8
+tmm0:
+	jmp	tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.d b/gas/testsuite/gas/i386/x86-64-amx-bad.d
new file mode 100644
index 0000000000..c2c622e4f6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.d
@@ -0,0 +1,24 @@
+#as:
+#objdump: -drw
+#name: x86_64 AMX insns
+#source: x86-64-amx-bad.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <\.text>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 d2 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 56 5c[ 	]*\(bad\)[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*dc 90 90 90 90 90[ 	]*fcoml.*
+[ 	]*[a-f0-9]+:[ 	]*c4 62 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 c2 52 5c dc[ 	]*tdpbf16ps %tmm5,\(bad\),%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 32 5c dc[ 	]*tdpbf16ps \(bad\),%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 09[ 	]*tileloadd \(bad\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 70 5e c9[ 	]*tdpbuud \(bad\),\(bad\),\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 5e c9[ 	]*tdpbuud %tmm0,\(bad\),\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 70 5e c8[ 	]*tdpbuud \(bad\),%tmm0,\(bad\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 70 5e c1[ 	]*tdpbuud \(bad\),\(bad\),%tmm0
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-bad.s b/gas/testsuite/gas/i386/x86-64-amx-bad.s
new file mode 100644
index 0000000000..2781553cb4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-bad.s
@@ -0,0 +1,63 @@
+.text
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.W = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0xd2
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.L = 1 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x56
+	.byte 0x5c
+	.byte 0xdc
+	.fill 0x05, 0x01, 0x90
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.R = 0 (illegal value).
+	.byte 0xc4
+	.byte 0x62
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.B = 0 (illegal value).
+	.byte 0xc4
+	.byte 0xc2
+	.byte 0x52
+	.byte 0x5c
+	.byte 0xdc
+	#tdpbf16ps %tmm5,%tmm4,%tmm3 set VEX.VVVV = 0110 (illegal value).
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x32
+	.byte 0x5c
+	.byte 0xdc
+	#tileloadd (%rax),%tmm1 set R/M= 001 (illegal value) without SIB.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x7b
+	.byte 0x4b
+	.byte 0x09
+	#tdpbuud %tmm1,%tmm1,%tmm1 All 3 TMM registers can't be identical.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x70
+	.byte 0x5e
+	.byte 0xc9
+	#tdpbuud %tmm0,%tmm1,%tmm1 All 3 TMM registers can't be identical.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x78
+	.byte 0x5e
+	.byte 0xc9
+	#tdpbuud %tmm1,%tmm0,%tmm1 All 3 TMM registers can't be identical.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x70
+	.byte 0x5e
+	.byte 0xc8
+	#tdpbuud %tmm1,%tmm1,%tmm0 All 3 TMM registers can't be identical.
+	.byte 0xc4
+	.byte 0xe2
+	.byte 0x70
+	.byte 0x5e
+	.byte 0xc1
diff --git a/gas/testsuite/gas/i386/x86-64-amx-intel.d b/gas/testsuite/gas/i386/x86-64-amx-intel.d
new file mode 100644
index 0000000000..fc5e0745ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d
@@ -0,0 +1,70 @@
+#as:
+#objdump: -d -Mintel
+#name: x86_64 AMX insns in Intel syntax
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.l b/gas/testsuite/gas/i386/x86-64-amx-inval.l
new file mode 100644
index 0000000000..6757b780ea
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.l
@@ -0,0 +1,17 @@
+.* Assembler messages:
+.*:5: Error: `\(%rip\)' cannot be used here
+.*:6: Error: `\(%rip\)' cannot be used here
+.*:7: Error: `\(%rip\)' cannot be used here
+.*:8: Error: operand size mismatch for `tdpbssd'
+.*:9: Error: operand size mismatch for `vaddps'
+.*:10: Error: all tmm registers must be distinct for `tdpbssd'
+.*:11: Error: all tmm registers must be distinct for `tdpbssd'
+.*:12: Error: all tmm registers must be distinct for `tdpbssd'
+.*:15: Error: `\[rip\]' cannot be used here
+.*:16: Error: `\[rip\]' cannot be used here
+.*:17: Error: `\[rip\]' cannot be used here
+.*:18: Error: operand size mismatch for `tdpbssd'
+.*:19: Error: operand size mismatch for `vaddps'
+.*:20: Error: all tmm registers must be distinct for `tdpbssd'
+.*:21: Error: all tmm registers must be distinct for `tdpbssd'
+.*:22: Error: all tmm registers must be distinct for `tdpbssd'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-inval.s b/gas/testsuite/gas/i386/x86-64-amx-inval.s
new file mode 100644
index 0000000000..6e29453669
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-inval.s
@@ -0,0 +1,22 @@
+# Check illegal SIBMEM and register size used in AMX instructions
+
+    .text
+_start:
+    tileloadd (%rip), %tmm1
+    tileloaddt1 (%rip), %tmm1
+    tilestored %tmm1, (%rip)
+    tdpbssd %xmm1, %xmm2, %xmm3
+    vaddps %tmm1, %tmm2, %tmm3
+    tdpbssd %tmm1, %tmm1, %tmm0
+    tdpbssd %tmm1, %tmm0, %tmm1
+    tdpbssd %tmm0, %tmm1, %tmm1
+
+    .intel_syntax noprefix
+    tileloadd tmm1, [rip]
+    tileloaddt1 tmm1, [rip]
+    tilestored [rip], tmm1
+    tdpbssd xmm3, xmm2, xmm1
+    vaddps %tmm1, %tmm2, %tmm3
+    tdpbssd tmm0, tmm1, tmm1
+    tdpbssd tmm1, tmm0, tmm1
+    tdpbssd tmm1, tmm1, tmm0
diff --git a/gas/testsuite/gas/i386/x86-64-amx.d b/gas/testsuite/gas/i386/x86-64-amx.d
new file mode 100644
index 0000000000..ad6f42240b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.d
@@ -0,0 +1,70 @@
+#as:
+#objdump: -d
+#name: x86_64 AMX insns
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-amx.s b/gas/testsuite/gas/i386/x86-64-amx.s
new file mode 100644
index 0000000000..c70543152b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.s
@@ -0,0 +1,61 @@
+
+  .allow_index_reg
+  .text
+_start:
+  ldtilecfg  (%rcx,%rdx,2)
+  sttilecfg  (%rcx,%rdx,2)
+  tdpbf16ps %tmm5, %tmm4, %tmm3
+  tdpbssd %tmm3, %tmm2, %tmm1
+  tdpbsud %tmm3, %tmm2, %tmm1
+  tdpbusd %tmm3, %tmm2, %tmm1
+  tdpbuud %tmm3, %tmm2, %tmm1
+  tileloadd foo, %tmm5
+  tileloadd (%rcx), %tmm5
+  tileloadd (%ecx), %tmm5
+  tileloadd (%rcx,%rdx,1), %tmm5
+  tileloadd (%ecx,%edx,2), %tmm1
+  tileloaddt1 foo, %tmm5
+  tileloaddt1 (%rcx), %tmm5
+  tileloaddt1 (%ecx), %tmm5
+  tileloaddt1 (%rcx,%rdx,1), %tmm5
+  tileloaddt1 (%ecx,%edx,2), %tmm1
+  tileloaddt1 (%rcx,%riz,2), %tmm1
+  tilerelease
+  tilestored %tmm5, (%rcx)
+  tilestored %tmm5, (%ecx)
+  tilestored %tmm5, (%rcx,%rdx,1)
+  tilestored %tmm1, (%ecx,%edx,2)
+  tilezero %tmm0
+  tilezero %tmm5
+  tilezero %tmm7
+
+
+  .intel_syntax noprefix
+  ldtilecfg  [rcx]
+  ldtilecfg  [rbx]
+  sttilecfg  [rcx]
+  sttilecfg  [rbx]
+  tdpbf16ps tmm3, tmm4, tmm5
+  tdpbssd tmm1, tmm2, tmm3
+  tdpbsud tmm1, tmm2, tmm3
+  tdpbusd tmm1, tmm2, tmm3
+  tdpbuud tmm1, tmm2, tmm3
+  tileloadd tmm5, foo
+  tileloadd tmm5, [rcx]
+  tileloadd tmm5, [ecx]
+  tileloadd tmm5, [rcx+rdx]
+  tileloadd tmm1, [ecx+edx*2]
+  tileloaddt1 tmm5, foo
+  tileloaddt1 tmm5, [rcx]
+  tileloaddt1 tmm5, [ecx]
+  tileloaddt1 tmm5, [rcx+rdx]
+  tileloaddt1 tmm1, [ecx+edx*2]
+  tileloaddt1 tmm1, [rcx+riz*2]
+  tilerelease
+  tilestored [rcx], tmm5
+  tilestored [ecx], tmm5
+  tilestored [rcx+rdx], tmm5
+  tilestored [ecx+edx*2], tmm1
+  tilezero tmm0
+  tilezero tmm5
+  tilezero tmm7
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 156d45d9b9..547b70071a 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -88,6 +88,7 @@ static void OP_MS (int, int);
 static void OP_XS (int, int);
 static void OP_M (int, int);
 static void OP_VEX (int, int);
+static void OP_VEX_TMM_Fixup (int, int);
 static void OP_VexW (int, int);
 static void OP_EX_Vex (int, int);
 static void OP_XMM_Vex (int, int);
@@ -370,6 +371,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define XMScalar { OP_XMM, scalar_mode }
 #define XMGatherQ { OP_XMM, vex_vsib_q_w_dq_mode }
 #define XMM { OP_XMM, xmm_mode }
+#define TMM { OP_XMM, tmm_mode }
 #define XMxmmq { OP_XMM, xmmq_mode }
 #define EM { OP_EM, v_mode }
 #define EMS { OP_EM, v_swap_mode }
@@ -386,6 +388,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
+#define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
@@ -415,6 +418,7 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Vex128 { OP_VEX, vex128_mode }
 #define Vex256 { OP_VEX, vex256_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define VexTmm { OP_VEX_TMM_Fixup, tmm_mode }
 #define EXdVexScalarS { OP_EX_Vex, d_scalar_swap_mode }
 #define EXqVexScalarS { OP_EX_Vex, q_scalar_swap_mode }
 #define XMVexScalar { OP_XMM_Vex, scalar_mode }
@@ -442,6 +446,8 @@ fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define MVexVSIBQWpX { OP_M, vex_vsib_q_w_dq_mode }
 #define MVexVSIBQDWpX { OP_M, vex_vsib_q_w_d_mode }
 
+#define MVexSIBMEM { OP_M, vex_sibmem_mode }
+
 /* Used handle "rep" prefix for string instructions.  */
 #define Xbr { REP_Fixup, eSI_reg }
 #define Xvr { REP_Fixup, eSI_reg }
@@ -533,6 +539,8 @@ enum
   ymmq_mode,
   /* 32-byte YMM or 16-byte word operand */
   ymmxmm_mode,
+  /* TMM operand */
+  tmm_mode,
   /* d_mode in 32bit, q_mode in 64bit mode.  */
   m_mode,
   /* pair of v_mode operands */
@@ -586,6 +594,8 @@ enum
   vex_vsib_q_w_dq_mode,
   /* Similar to vex_vsib_q_w_dq_mode, with smaller memory.  */
   vex_vsib_q_w_d_mode,
+  /* mandatory non-vector SIB.  */
+  vex_sibmem_mode,
 
   /* scalar, ignore vector length.  */
   scalar_mode,
@@ -734,6 +744,7 @@ enum
   REG_VEX_0F72,
   REG_VEX_0F73,
   REG_VEX_0FAE,
+  REG_VEX_0F3849_P_0_W_0_M_1,
   REG_VEX_0F38F3,
 
   REG_0FXOP_09_01_L_0,
@@ -818,6 +829,17 @@ enum
   MOD_0FE7_PREFIX_2,
   MOD_0FF0_PREFIX_3,
   MOD_0F382A_PREFIX_2,
+  MOD_VEX_0F3849_P_0_W_0,
+  MOD_VEX_0F3849_P_2_W_0,
+  MOD_VEX_0F3849_P_3_W_0,
+  MOD_VEX_0F384B_P_1_W_0,
+  MOD_VEX_0F384B_P_2_W_0,
+  MOD_VEX_0F384B_P_3_W_0,
+  MOD_VEX_0F385C_P_1_W_0,
+  MOD_VEX_0F385E_P_0_W_0,
+  MOD_VEX_0F385E_P_1_W_0,
+  MOD_VEX_0F385E_P_2_W_0,
+  MOD_VEX_0F385E_P_3_W_0,
   MOD_0F38F5_PREFIX_2,
   MOD_0F38F6_PREFIX_0,
   MOD_0F38F8_PREFIX_1,
@@ -957,6 +979,7 @@ enum
   RM_0F1E_P_1_MOD_3_REG_7,
   RM_0FAE_REG_6_MOD_3_P_0,
   RM_0FAE_REG_7_MOD_3,
+  RM_VEX_0F3849_P_0_W_0_M_1_R_0
 };
 
 enum
@@ -1292,9 +1315,13 @@ enum
   PREFIX_VEX_0F3845,
   PREFIX_VEX_0F3846,
   PREFIX_VEX_0F3847,
+  PREFIX_VEX_0F3849,
+  PREFIX_VEX_0F384B,
   PREFIX_VEX_0F3858,
   PREFIX_VEX_0F3859,
   PREFIX_VEX_0F385A,
+  PREFIX_VEX_0F385C,
+  PREFIX_VEX_0F385E,
   PREFIX_VEX_0F3878,
   PREFIX_VEX_0F3879,
   PREFIX_VEX_0F388C,
@@ -1667,7 +1694,19 @@ enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_2,
-  X86_64_0F01_REG_3
+  X86_64_0F01_REG_3,
+  X86_64_VEX_0F3849_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0,
+  X86_64_VEX_0F3849_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F3849_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F384B_P_3_W_0_M_0_L_0,
+  X86_64_VEX_0F385C_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_0_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_1_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_2_W_0_M_0_L_0,
+  X86_64_VEX_0F385E_P_3_W_0_M_0_L_0
 };
 
 enum
@@ -1752,7 +1791,19 @@ enum
   VEX_LEN_0F381A_P_2_M_0,
   VEX_LEN_0F3836_P_2,
   VEX_LEN_0F3841_P_2,
+  VEX_LEN_0F3849_P_0_W_0_M_0,
+  VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0,
+  VEX_LEN_0F3849_P_2_W_0_M_0,
+  VEX_LEN_0F3849_P_3_W_0_M_0,
+  VEX_LEN_0F384B_P_1_W_0_M_0,
+  VEX_LEN_0F384B_P_2_W_0_M_0,
+  VEX_LEN_0F384B_P_3_W_0_M_0,
   VEX_LEN_0F385A_P_2_M_0,
+  VEX_LEN_0F385C_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_0_W_0_M_0,
+  VEX_LEN_0F385E_P_1_W_0_M_0,
+  VEX_LEN_0F385E_P_2_W_0_M_0,
+  VEX_LEN_0F385E_P_3_W_0_M_0,
   VEX_LEN_0F38DB_P_2,
   VEX_LEN_0F38F2_P_0,
   VEX_LEN_0F38F3_R_1_P_0,
@@ -1960,9 +2011,20 @@ enum
   VEX_W_0F382F_P_2_M_0,
   VEX_W_0F3836_P_2,
   VEX_W_0F3846_P_2,
+  VEX_W_0F3849_P_0,
+  VEX_W_0F3849_P_2,
+  VEX_W_0F3849_P_3,
+  VEX_W_0F384B_P_1,
+  VEX_W_0F384B_P_2,
+  VEX_W_0F384B_P_3,
   VEX_W_0F3858_P_2,
   VEX_W_0F3859_P_2,
   VEX_W_0F385A_P_2_M_0,
+  VEX_W_0F385C_P_1,
+  VEX_W_0F385E_P_0,
+  VEX_W_0F385E_P_1,
+  VEX_W_0F385E_P_2,
+  VEX_W_0F385E_P_3,
   VEX_W_0F3878_P_2,
   VEX_W_0F3879_P_2,
   VEX_W_0F38CF_P_2,
@@ -2954,6 +3016,14 @@ vex;
 static unsigned char need_vex;
 static unsigned char need_vex_reg;
 
+static struct
+  {
+    int tmm0;
+    int tmm1;
+    int tmm2;
+  }
+amx_operands;
+
 struct op
   {
     const char *name;
@@ -3116,6 +3186,16 @@ static const char *att_names_zmm[] = {
   "%zmm28", "%zmm29", "%zmm30", "%zmm31"
 };
 
+static const char **names_tmm;
+static const char *intel_names_tmm[] = {
+  "tmm0", "tmm1", "tmm2", "tmm3",
+  "tmm4", "tmm5", "tmm6", "tmm7"
+};
+static const char *att_names_tmm[] = {
+  "%tmm0", "%tmm1", "%tmm2", "%tmm3",
+  "%tmm4", "%tmm5", "%tmm6", "%tmm7"
+};
+
 static const char **names_mask;
 static const char *intel_names_mask[] = {
   "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7"
@@ -3484,6 +3564,10 @@ static const struct dis386 reg_table[][8] = {
     { MOD_TABLE (MOD_VEX_0FAE_REG_2) },
     { MOD_TABLE (MOD_VEX_0FAE_REG_3) },
   },
+  /* REG_VEX_0F3849_P_0_W_0_M_1 */
+  {
+    { RM_TABLE (RM_VEX_0F3849_P_0_W_0_M_1_R_0) },
+  },
   /* REG_VEX_0F38F3 */
   {
     { Bad_Opcode },
@@ -5865,6 +5949,22 @@ static const struct dis386 prefix_table[][4] = {
     { "vpsllv%LW", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F3849 */
+  {
+    { VEX_W_TABLE (VEX_W_0F3849_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3849_P_2) },
+    { VEX_W_TABLE (VEX_W_0F3849_P_3) },
+  },
+
+  /* PREFIX_VEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384B_P_1) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_2) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_3) },
+  },
+
   /* PREFIX_VEX_0F3858 */
   {
     { Bad_Opcode },
@@ -5886,6 +5986,21 @@ static const struct dis386 prefix_table[][4] = {
     { MOD_TABLE (MOD_VEX_0F385A_PREFIX_2) },
   },
 
+  /* PREFIX_VEX_0F385C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F385C_P_1) },
+    { Bad_Opcode },
+  },
+
+  /* PREFIX_VEX_0F385E */
+  {
+    { VEX_W_TABLE (VEX_W_0F385E_P_0) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_1) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_2) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_3) },
+  },
+
   /* PREFIX_VEX_0F3878 */
   {
     { Bad_Opcode },
@@ -6901,6 +7016,78 @@ static const struct dis386 x86_64_table[][2] = {
     { "lidt{Q|Q}", { M }, 0 },
     { "lidt", { M }, 0 },
   },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "ldtilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilerelease", { Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "sttilecfg", { M }, 0 },
+  },
+
+  /* X86_64_VEX_0F3849_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilezero", { TMM, Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tilestored", { MVexSIBMEM, TMM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloaddt1", { TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* X86_64_VEX_0F384B_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tileloadd", { TMM, MVexSIBMEM }, 0 },
+  },
+
+  /* X86_64_VEX_0F385C_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbf16ps", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_0_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbuud", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_1_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbsud", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_2_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbusd", { TMM, EXtmm, VexTmm }, 0 },
+  },
+
+  /* X86_64_VEX_0F385E_P_3_W_0_M_0_L_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbssd", { TMM, EXtmm, VexTmm }, 0 },
+  },
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8742,9 +8929,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3847) },
     /* 48 */
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -8763,9 +8950,9 @@ static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3859) },
     { PREFIX_TABLE (PREFIX_VEX_0F385A) },
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385C) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385E) },
     { Bad_Opcode },
     /* 60 */
     { Bad_Opcode },
@@ -9503,12 +9690,72 @@ static const struct dis386 vex_len_table[][2] = {
     { "vphminposuw",	{ XM, EXx }, 0 },
   },
 
+  /* VEX_LEN_0F3849_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_0_W_0_M_1_REG_0_RM_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F3849_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F3849_P_3_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F384B_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F384B_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F385A_P_2_M_0 */
   {
     { Bad_Opcode },
     { VEX_W_TABLE (VEX_W_0F385A_P_2_M_0) },
   },
 
+  /* VEX_LEN_0F385C_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385C_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_0_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_0_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_1_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_1_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_2_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_2_W_0_M_0_L_0) },
+  },
+
+  /* VEX_LEN_0F385E_P_3_W_0_M_0 */
+  {
+    { X86_64_TABLE (X86_64_VEX_0F385E_P_3_W_0_M_0_L_0) },
+  },
+
   /* VEX_LEN_0F38DB_P_2 */
   {
     { "vaesimc",	{ XM, EXx }, 0 },
@@ -10201,6 +10448,30 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3846_P_2 */
     { "vpsravd",	{ XM, Vex, EXx }, 0 },
   },
+  {
+    /* VEX_W_0F3849_P_0 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_2 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_3 */
+    { MOD_TABLE (MOD_VEX_0F3849_P_3_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_1 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_2 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F384B_P_3 */
+    { MOD_TABLE (MOD_VEX_0F384B_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3858_P_2 */
     { "vpbroadcastd", { XM, EXxmm_md }, 0 },
@@ -10213,6 +10484,26 @@ static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F385A_P_2_M_0 */
     { "vbroadcasti128", { XM, Mxmm }, 0 },
   },
+  {
+    /* VEX_W_0F385C_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385C_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_0 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_0_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_1 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_1_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_2 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_2_W_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_3 */
+    { MOD_TABLE (MOD_VEX_0F385E_P_3_W_0) },
+  },
   {
     /* VEX_W_0F3878_P_2 */
     { "vpbroadcastb",	{ XM, EXxmm_mb }, 0 },
@@ -10805,6 +11096,57 @@ static const struct dis386 mod_table[][2] = {
     /* MOD_0F382A_PREFIX_2 */
     { "movntdqa",	{ XM, Mx }, 0 },
   },
+  {
+    /* MOD_VEX_0F3849_P_0_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_0) },
+    { REG_TABLE (REG_VEX_0F3849_P_0_W_0_M_1) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F3849_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_1_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_2_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F384B_P_3_W_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F384B_P_3_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385C_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385C_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_0_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_0_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_1_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_1_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_2_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_2_W_0_M_0) },
+  },
+  {
+    /* MOD_VEX_0F385E_P_3_W_0 */
+    { Bad_Opcode },
+    { VEX_LEN_TABLE (VEX_LEN_0F385E_P_3_W_0_M_0) },
+  },
   {
     /* MOD_0F38F5_PREFIX_2 */
     { "wrussK",		{ M, Gdq }, PREFIX_OPCODE },
@@ -11371,6 +11713,10 @@ static const struct dis386 rm_table[][8] = {
     { "sfence",		{ Skip_MODRM }, 0 },
 
   },
+  {
+    /* RM_VEX_0F3849_P_0_W_0_M_1_R_0 */
+    { VEX_LEN_TABLE (VEX_LEN_0F3849_P_0_W_0_M_1_REG_0_RM_0) },
+  },
 };
 
 #define INTERNAL_DISASSEMBLER_ERROR _("<internal disassembler error>")
@@ -12267,6 +12613,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = intel_names_xmm;
       names_ymm = intel_names_ymm;
       names_zmm = intel_names_zmm;
+      names_tmm = intel_names_tmm;
       index64 = intel_index64;
       index32 = intel_index32;
       names_mask = intel_names_mask;
@@ -12289,6 +12636,7 @@ print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = att_names_xmm;
       names_ymm = att_names_ymm;
       names_zmm = att_names_zmm;
+      names_tmm = att_names_tmm;
       index64 = att_index64;
       index32 = att_index32;
       names_mask = att_names_mask;
@@ -12419,6 +12767,9 @@ print_insn (bfd_vma pc, disassemble_info *info)
   need_vex = 0;
   need_vex_reg = 0;
   memset (&vex, 0, sizeof (vex));
+  amx_operands.tmm0 = -1;
+  amx_operands.tmm1 = -1;
+  amx_operands.tmm2 = -1;
 
   if (dp->name == NULL && dp->op[0].bytemode == FLOATCODE)
     {
@@ -14444,6 +14795,15 @@ OP_E_memory (int bytemode, int sizeflag)
 	  base = sib.base;
 	  codep++;
 	}
+      else
+	{
+	  /* mandatory non-vector SIB must have sib */
+	  if (bytemode == vex_sibmem_mode)
+	    {
+	      oappend ("(bad)");
+	      return;
+	    }
+	}
       rbase = base + add;
 
       switch (modrm.mod)
@@ -15471,6 +15831,7 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != scalar_mode)
     {
       switch (vex.length)
@@ -15509,6 +15870,16 @@ OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+	{
+	  oappend ("(bad)");
+	  return;
+	}
+      amx_operands.tmm0 = reg;
+      names = names_tmm;
+    }
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15633,6 +16004,7 @@ OP_EX (int bytemode, int sizeflag)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != d_scalar_swap_mode
       && bytemode != q_scalar_swap_mode
       && bytemode != vex_scalar_w_dq_mode)
@@ -15668,6 +16040,16 @@ OP_EX (int bytemode, int sizeflag)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+	{
+	  oappend ("(bad)");
+	  return;
+	}
+      amx_operands.tmm1 = reg;
+      names = names_tmm;
+    }
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -16223,6 +16605,18 @@ OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       return;
     }
 
+  if (bytemode == tmm_mode)
+    {
+      if (reg >= 8)
+        {
+	  oappend ("(bad)");
+	  return;
+        }
+      amx_operands.tmm2 = reg;
+      oappend (names_tmm[reg]);
+      return;
+    }
+
   switch (vex.length)
     {
     case 128:
@@ -16290,6 +16684,41 @@ OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   oappend (names[reg]);
 }
 
+static void
+OP_VEX_TMM_Fixup (int bytemode, int sizeflag)
+{
+  OP_VEX (bytemode, sizeflag);
+
+  if (amx_operands.tmm0!=-1
+      && amx_operands.tmm1!=-1
+      && amx_operands.tmm2!=-1)
+  {
+    /* All 3 TMM registers must be distinct.  */
+    if (amx_operands.tmm1 == amx_operands.tmm0
+	&& amx_operands.tmm2 == amx_operands.tmm0)
+      {
+	strcpy (op_out[0], "(bad)");
+	strcpy (op_out[1], "(bad)");
+	strcpy (op_out[2], "(bad)");
+      }
+    else if (amx_operands.tmm1 == amx_operands.tmm0)
+      {
+	strcpy (op_out[0], "(bad)");
+	strcpy (op_out[1], "(bad)");
+      }
+    else if (amx_operands.tmm2 == amx_operands.tmm0)
+      {
+	strcpy (op_out[0], "(bad)");
+	strcpy (op_out[2], "(bad)");
+      }
+    else if (amx_operands.tmm2 == amx_operands.tmm1)
+      {
+	strcpy (op_out[1], "(bad)");
+	strcpy (op_out[2], "(bad)");
+      }
+  }
+}
+
 static void
 OP_VexW (int bytemode, int sizeflag)
 {
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 7230f87344..3334155071 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -297,6 +297,12 @@ static initializer cpu_flag_init[] =
     "CpuWAITPKG" },
   { "CPU_CLDEMOTE_FLAGS",
     "CpuCLDEMOTE" },
+  { "CPU_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_MOVDIR64B_FLAGS",
@@ -383,6 +389,12 @@ static initializer cpu_flag_init[] =
     "CpuAVX512_BITALG" },
   { "CPU_ANY_AVX512_BF16_FLAGS",
     "CpuAVX512_BF16" },
+  { "CPU_ANY_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_ANY_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_ANY_AMX_TILE_FLAGS",
+    "CpuAMX_TILE|CpuAMX_INT8|CpuAMX_BF16" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -459,6 +471,8 @@ static initializer operand_type_init[] =
     "Class=RegSIMD|Ymmword" },
   { "OPERAND_TYPE_REGZMM",
     "Class=RegSIMD|Zmmword" },
+  { "OPERAND_TYPE_REGTMM",
+    "Class=RegSIMD|Tmmword" },
   { "OPERAND_TYPE_REGMASK",
     "Class=RegMask" },
   { "OPERAND_TYPE_REGBND",
@@ -611,6 +625,9 @@ static bitfield cpu_flags[] =
   BITFIELD (CpuPCONFIG),
   BITFIELD (CpuWAITPKG),
   BITFIELD (CpuCLDEMOTE),
+  BITFIELD (CpuAMX_INT8),
+  BITFIELD (CpuAMX_BF16),
+  BITFIELD (CpuAMX_TILE),
   BITFIELD (CpuMOVDIRI),
   BITFIELD (CpuMOVDIR64B),
   BITFIELD (CpuENQCMD),
@@ -741,6 +758,7 @@ static bitfield operand_types[] =
   BITFIELD (Xmmword),
   BITFIELD (Ymmword),
   BITFIELD (Zmmword),
+  BITFIELD (Tmmword),
   BITFIELD (Unspecified),
 #ifdef OTUnused
   BITFIELD (OTUnused),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index c65febbe81..b8a6dfc25c 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -223,6 +223,12 @@ enum
   /* CET instructions support required */
   CpuIBT,
   CpuSHSTK,
+  /* AMX-INT8 instructions required */
+  CpuAMX_INT8,
+  /* AMX-BF16 instructions required */
+  CpuAMX_BF16,
+  /* AMX-TILE instructions required */
+  CpuAMX_TILE,
   /* GFNI instructions required */
   CpuGFNI,
   /* VAES instructions required */
@@ -372,6 +378,9 @@ typedef union i386_cpu_flags
       unsigned int cpuptwrite:1;
       unsigned int cpuibt:1;
       unsigned int cpushstk:1;
+      unsigned int cpuamx_int8:1;
+      unsigned int cpuamx_bf16:1;
+      unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
       unsigned int cpuvpclmulqdq:1;
@@ -574,7 +583,9 @@ enum
 #define VECSIB128	1
 #define VECSIB256	2
 #define VECSIB512	3
+#define SIBMEM		4
   SIB,
+
   /* SSE to AVX support required */
   SSE2AVX,
   /* No AVX equivalent */
@@ -702,7 +713,7 @@ typedef struct i386_opcode_modifier
   unsigned int vexw:2;
   unsigned int vexopcode:3;
   unsigned int vexsources:2;
-  unsigned int sib:2;
+  unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
   unsigned int evex:3;
@@ -807,6 +818,8 @@ enum
   Ymmword,
   /* ZMMWORD size.  */
   Zmmword,
+  /* TMMWORD size.  */
+  Tmmword,
   /* Unspecified memory size.  */
   Unspecified,
 
@@ -851,6 +864,7 @@ typedef union i386_operand_type
       unsigned int xmmword:1;
       unsigned int ymmword:1;
       unsigned int zmmword:1;
+      unsigned int tmmword:1;
       unsigned int unspecified:1;
 #ifdef OTUnused
       unsigned int unused:(OTNumOfBits - OTUnused);
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index a35817a7f1..bb7fb02dde 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -52,6 +52,7 @@
 #define RegXMM Class=RegSIMD|Xmmword
 #define RegYMM Class=RegSIMD|Ymmword
 #define RegZMM Class=RegSIMD|Zmmword
+#define RegTMM Class=RegSIMD|Tmmword
 
 #define RegMask Class=RegMask
 
@@ -88,6 +89,7 @@
 #define VecSIB128 SIB=VECSIB128
 #define VecSIB256 SIB=VECSIB256
 #define VecSIB512 SIB=VECSIB512
+#define Sibmem SIB=SIBMEM|Modrm
 
 #define EVex128 EVex=EVEX128
 #define EVex256 EVex=EVEX256
@@ -4093,3 +4095,24 @@ xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
 xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
 
 // TSXLDTRK instructions end.
+
+// AMX instructions.
+
+ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+
+tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex128|VexOpcode=1|VexVVVV=1|VexW0|SwapSources|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+
+tileloadd, 2, 0xf24b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 2, 0x664b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 2, 0xf34b, None, 1, CpuAMX_TILE|Cpu64, Sibmem|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
+
+tilerelease, 0, 0x49c0, None, 2, CpuAMX_TILE|Cpu64, Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
+
+tilezero, 1, 0xf249, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex128|VexOpcode=1|VexW0|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM }
+
+// AMX instructions end.
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index cdff763ca7..ca7eeba488 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -278,6 +278,15 @@ zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval
 zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval
 zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval
 zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval
+// TMM registers for AMX
+tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval
+tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval
+tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval
+tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval
+tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval
+tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval
+tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval
+tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval
 // Bound registers for MPX
 bnd0, Class=RegBND, 0, 0, Dw2Inval, Dw2Inval
 bnd1, Class=RegBND, 0, 1, Dw2Inval, Dw2Inval
-- 
2.17.1
Thanks,
Lili.

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index ae2a2c1a53..a899d5ec0e 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -290,6 +290,7 @@  enum i386_error
     unsupported_with_intel_mnemonic,
     unsupported_syntax,
     unsupported,
+    invalid_sib_address,
     invalid_vsib_address,
     invalid_vector_register_set,
     unsupported_vector_index_register,
@@ -372,6 +373,9 @@  struct _i386_insn
     /* Has ZMM register operands.  */
     bfd_boolean has_regzmm;
 
+    /* Has TMM register operands.  */
+    bfd_boolean has_regtmm;
+
     /* Has GOTPC or TLS relocation.  */
     bfd_boolean has_gotpc_tls_reloc;
 
@@ -1202,6 +1206,12 @@  static const arch_entry cpu_arch[] =
     CPU_WAITPKG_FLAGS, 0 },
   { STRING_COMMA_LEN (".cldemote"), PROCESSOR_UNKNOWN,
     CPU_CLDEMOTE_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx-int8"), PROCESSOR_UNKNOWN,
+    CPU_AMX_INT8_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx-bf16"), PROCESSOR_UNKNOWN,
+    CPU_AMX_BF16_FLAGS, 0 },
+  { STRING_COMMA_LEN (".amx-tile"), PROCESSOR_UNKNOWN,
+    CPU_AMX_TILE_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdiri"), PROCESSOR_UNKNOWN,
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
@@ -1260,6 +1270,9 @@  static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noavx512_bitalg"), CPU_ANY_AVX512_BITALG_FLAGS },
   { STRING_COMMA_LEN ("noibt"), CPU_ANY_IBT_FLAGS },
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
+  { STRING_COMMA_LEN ("noamx_int8"), CPU_ANY_AMX_INT8_FLAGS },
+  { STRING_COMMA_LEN ("noamx_bf16"), CPU_ANY_AMX_BF16_FLAGS },
+  { STRING_COMMA_LEN ("noamx_tile"), CPU_ANY_AMX_TILE_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
   { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
@@ -2297,6 +2310,7 @@  operand_type_match (i386_operand_type overlap,
   temp.bitfield.xmmword = 0;
   temp.bitfield.ymmword = 0;
   temp.bitfield.zmmword = 0;
+  temp.bitfield.tmmword = 0;
   if (operand_type_all_zero (&temp))
     goto mismatch;
 
@@ -3305,6 +3319,7 @@  const type_names[] =
   { OPERAND_TYPE_REGXMM, "rXMM" },
   { OPERAND_TYPE_REGYMM, "rYMM" },
   { OPERAND_TYPE_REGZMM, "rZMM" },
+  { OPERAND_TYPE_REGTMM, "rTMM" },
   { OPERAND_TYPE_REGMASK, "Mask reg" },
 };
 
@@ -5793,9 +5808,18 @@  check_VecOperands (const insn_template *t)
       return 1;
     }
 
+  /* Disallow using IP register for the mandatory non-vector SIB.  */
+  if (t->opcode_modifier.sib == SIBMEM
+      && i.base_reg
+      && i.base_reg->reg_num == RegIP)
+    {
+	i.error = invalid_sib_address;
+	return 1;
+    }
+
   /* For VSIB byte, we need a vector register for index, and all vector
      registers must be distinct.  */
-  if (t->opcode_modifier.sib)
+  if (t->opcode_modifier.sib && t->opcode_modifier.sib != SIBMEM)
     {
       if (!i.index_reg
 	  || !((t->opcode_modifier.sib == VECSIB128
@@ -6589,6 +6613,9 @@  match_template (char mnem_suffix)
 	  as_bad (_("unsupported instruction `%s'"),
 		  current_templates->start->name);
 	  return NULL;
+	case invalid_sib_address:
+	  err_msg = _("invalid SIB address");
+	  break;
 	case invalid_vsib_address:
 	  err_msg = _("invalid VSIB address");
 	  break;
@@ -7791,12 +7818,22 @@  build_modrm_byte (void)
      operands, it must be a instruction with VexNDS.  For a
      instruction with VexNDD, the destination register is encoded
      in VEX prefix.  If there are 4 register operands, it must be
-     a instruction with VEX prefix and 3 sources.  */
+     a instruction with VEX prefix and 3 sources. For instruction
+     with 3 register operands, the VEXOP3 indicates we are going
+     to use VEX.vvvv field to encode the third operand, which is
+     different from the VEXXDS case where VEX.vvvv is normally used
+     to encode the second operand. To be clear, the second operand
+     means operand OP2 and the third operand means operand OP3
+     in below Intel-syntax assembly code:
+
+        INST_OP OP1, OP2, OP3
+   */
   if (i.mem_operands == 0
       && ((i.reg_operands == 2
 	   && i.tm.opcode_modifier.vexvvvv <= VEXXDS)
 	  || (i.reg_operands == 3
-	      && i.tm.opcode_modifier.vexvvvv == VEXXDS)
+	      && (i.tm.opcode_modifier.vexvvvv == VEXXDS
+		  || i.tm.opcode_modifier.vexvvvv == VEXOP3))
 	  || (i.reg_operands == 4 && vex_3_sources)))
     {
       switch (i.operands)
@@ -7808,10 +7845,11 @@  build_modrm_byte (void)
 	  /* When there are 3 operands, one of them may be immediate,
 	     which may be the first or the last operand.  Otherwise,
 	     the first operand must be shift count register (cl) or it
-	     is an instruction with VexNDS. */
+	     is an instruction with VexNDS or VEXOP3. */
 	  gas_assert (i.imm_operands == 1
 		      || (i.imm_operands == 0
 			  && (i.tm.opcode_modifier.vexvvvv == VEXXDS
+			      || i.tm.opcode_modifier.vexvvvv == VEXOP3
 			      || (i.types[0].bitfield.instance == RegC
 				  && i.types[0].bitfield.byte))));
 	  if (operand_type_check (i.types[0], imm)
@@ -7910,6 +7948,19 @@  build_modrm_byte (void)
 	      i.vex.register_specifier = i.op[vvvv].regs;
 	      dest++;
 	    }
+	  /* Unlike VEXXDS, we are going to use VEX.vvvv to encode
+	     the third operand which is i.op[source].  At this stage
+	     the variable source is 0 and the dest is 1.  Then we need
+	     to increase source to represent the second operand and dest
+	     to represent the first operand which is also the destination
+	     register, the rest code will take care of the encoding of
+	     those two operands.  */
+	  else if (i.tm.opcode_modifier.vexvvvv == VEXOP3)
+	    {
+	      i.vex.register_specifier = i.op[source].regs;
+	      source++;
+	      dest++;
+	    }
 	}
 
       i.rm.mode = 3;
@@ -7936,6 +7987,9 @@  build_modrm_byte (void)
 	      else if (i.types[dest].bitfield.ymmword
 		       || i.types[source].bitfield.ymmword)
 		i.has_regymm = TRUE;
+	      else if (i.types[dest].bitfield.tmmword
+		       || i.types[source].bitfield.tmmword)
+		i.has_regtmm = TRUE;
 	      else
 		i.has_regxmm = TRUE;
 	    }
@@ -7973,7 +8027,9 @@  build_modrm_byte (void)
 
 	  if (i.tm.opcode_modifier.sib)
 	    {
-	      if (i.index_reg->reg_num == RegIZ)
+	      /* The index register of VSIB shouldn't be RegIZ.  */
+	      if (i.tm.opcode_modifier.sib != SIBMEM
+		  && i.index_reg->reg_num == RegIZ)
 		abort ();
 
 	      i.rm.regmem = ESCAPE_TO_TWO_BYTE_ADDRESSING;
@@ -7996,8 +8052,19 @@  build_modrm_byte (void)
 		      i.types[op].bitfield.disp32s = 1;
 		    }
 		}
-	      i.sib.index = i.index_reg->reg_num;
-	      set_rex_vrex (i.index_reg, REX_X, FALSE);
+
+	      /* Since the mandatory SIB always has index register, so
+		 the code logic remains unchanged. The non-mandatory SIB
+		 without index register is allowed and will be handled
+		 later.  */
+	      if (i.index_reg)
+		{
+		  if (i.index_reg->reg_num == RegIZ)
+		    i.sib.index = NO_INDEX_REGISTER;
+		  else
+		    i.sib.index = i.index_reg->reg_num;
+		  set_rex_vrex (i.index_reg, REX_X, FALSE);
+		}
 	    }
 
 	  default_seg = &ds;
@@ -8011,7 +8078,9 @@  build_modrm_byte (void)
 		{
 		  i386_operand_type newdisp;
 
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB.  */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
 		  /* Operand is just <disp>  */
 		  if (flag_code == CODE_64BIT)
 		    {
@@ -8149,7 +8218,10 @@  build_modrm_byte (void)
 	      i.sib.scale = i.log2_scale_factor;
 	      if (i.index_reg == 0)
 		{
-		  gas_assert (!i.tm.opcode_modifier.sib);
+		  /* Only check for VSIB.  */
+		  gas_assert (!i.tm.opcode_modifier.sib
+			      || i.tm.opcode_modifier.sib == SIBMEM);
+
 		  /* <disp>(%esp) becomes two byte modrm with no index
 		     register.  We've already stored the code for esp
 		     in i.rm.regmem ie. ESCAPE_TO_TWO_BYTE_ADDRESSING.
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index d4e6fcb698..cb86cc7968 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -226,6 +226,12 @@  accept various extension mnemonics.  For example,
 @code{noenqcmd},
 @code{noserialize},
 @code{notsxldtrk},
+@code{amx_int8},
+@code{noamx_int8},
+@code{amx_bf16},
+@code{noamx_bf16},
+@code{amx_tile},
+@code{noamx_tile},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1504,6 +1510,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
 @item @samp{.movdiri} @tab @samp{.movdir64b} @tab @samp{.enqcmd} @tab @samp{.tsxldtrk}
+@item @samp{.amx_int8} @tab @samp{.amx_bf16} @tab @samp{.amx_tile}
 @item @samp{.3dnow} @tab @samp{.3dnowa} @tab @samp{.sse4a} @tab @samp{.sse5}
 @item @samp{.syscall} @tab @samp{.rdtscp} @tab @samp{.svme}
 @item @samp{.lwp} @tab @samp{.fma4} @tab @samp{.xop} @tab @samp{.cx16}
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index 6bee5fc9d8..fffa7e456b 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -1139,6 +1139,9 @@  if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-lfence-ret-d"
     run_dump_test "x86-64-lfence-ret-e"
     run_dump_test "x86-64-lfence-byte"
+    run_list_test "x86-64-amx-sibmem-inval"
+    run_dump_test "x86-64-amx"
+    run_dump_test "x86-64-amx-intel"
 
     if { ![istarget "*-*-aix*"]
       && ![istarget "*-*-beos*"]
diff --git a/gas/testsuite/gas/i386/x86-64-amx-intel.d b/gas/testsuite/gas/i386/x86-64-amx-intel.d
new file mode 100644
index 0000000000..d875f08bf3
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-intel.d
@@ -0,0 +1,69 @@ 
+#as:
+#objdump: -d -Mintel
+#name: x86_64 AMX insns in Intel syntax
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \[rcx\+rdx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \[rcx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \[rbx\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps tmm3,tmm4,tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud tmm1,tmm2,tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 tmm5,ds:0x0
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[rcx\+riz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 tmm5,\[ecx\+eiz\*1\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 tmm5,\[rcx\+rdx\*1\]
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 tmm1,\[ecx\+edx\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 tmm1,\[rcx\+riz\*2\]
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored \[rcx\+riz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored \[ecx\+eiz\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored \[rcx\+rdx\*1\],tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored \[ecx\+edx\*2\],tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero tmm7
diff --git a/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.l b/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.l
new file mode 100644
index 0000000000..d3a84646f4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.l
@@ -0,0 +1,7 @@ 
+.* Assembler messages:
+.*:5: Error: invalid SIB address for `tileloadd'
+.*:6: Error: invalid SIB address for `tileloaddt1'
+.*:7: Error: invalid SIB address for `tilestored'
+.*:10: Error: invalid SIB address for `tileloadd'
+.*:11: Error: invalid SIB address for `tileloaddt1'
+.*:12: Error: invalid SIB address for `tilestored'
diff --git a/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.s b/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.s
new file mode 100644
index 0000000000..31efebfb8f
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx-sibmem-inval.s
@@ -0,0 +1,12 @@ 
+# Check for SIBMEM operand used in certain AMX instructions
+
+    .text
+_start:
+    tileloadd (%rip), %tmm1
+    tileloaddt1 (%rip), %tmm1
+    tilestored  %tmm1, (%rip)
+
+    .intel_syntax noprefix
+    tileloadd tmm1, [rip]
+    tileloaddt1 tmm1, [rip]
+    tilestored  [rip], tmm1
diff --git a/gas/testsuite/gas/i386/x86-64-amx.d b/gas/testsuite/gas/i386/x86-64-amx.d
new file mode 100644
index 0000000000..5df3614de8
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.d
@@ -0,0 +1,69 @@ 
+#as:
+#objdump: -d
+#name: x86_64 AMX insns
+#source: x86-64-amx.s
+
+.*: +file format .*
+
+
+Disassembly of section \.text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 04 51[ 	]*ldtilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 04 51[ 	]*sttilecfg \(%rcx,%rdx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 01[ 	]*ldtilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 03[ 	]*ldtilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 01[ 	]*sttilecfg \(%rcx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 49 03[ 	]*sttilecfg \(%rbx\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 52 5c dc[ 	]*tdpbf16ps %tmm5,%tmm4,%tmm3
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 63 5e ca[ 	]*tdpbssd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 62 5e ca[ 	]*tdpbsud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 61 5e ca[ 	]*tdpbusd %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 60 5e ca[ 	]*tdpbuud %tmm3,%tmm2,%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 25 00[ 	]*tileloadd 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 2c 21[ 	]*tileloadd \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 4b 2c 11[ 	]*tileloadd \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7b 4b 0c 51[ 	]*tileloadd \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 25 00[ 	]*tileloaddt1 0x0,%tmm5
+[ 	]*[a-f0-9]+:[ 	]*00 00 00[ 	]*
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%rcx,%riz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 2c 21[ 	]*tileloaddt1 \(%ecx,%eiz,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 2c 11[ 	]*tileloaddt1 \(%rcx,%rdx,1\),%tmm5
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 79 4b 0c 51[ 	]*tileloaddt1 \(%ecx,%edx,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 79 4b 0c 61[ 	]*tileloaddt1 \(%rcx,%riz,2\),%tmm1
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 78 49 c0[ 	]*tilerelease *
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%rcx,%riz,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 2c 21[ 	]*tilestored %tmm5,\(%ecx,%eiz,1\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7a 4b 2c 11[ 	]*tilestored %tmm5,\(%rcx,%rdx,1\)
+[ 	]*[a-f0-9]+:[ 	]*67 c4 e2 7a 4b 0c 51[ 	]*tilestored %tmm1,\(%ecx,%edx,2\)
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 c0[ 	]*tilezero %tmm0
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 e8[ 	]*tilezero %tmm5
+[ 	]*[a-f0-9]+:[ 	]*c4 e2 7b 49 f8[ 	]*tilezero %tmm7
diff --git a/gas/testsuite/gas/i386/x86-64-amx.s b/gas/testsuite/gas/i386/x86-64-amx.s
new file mode 100644
index 0000000000..c70543152b
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-amx.s
@@ -0,0 +1,61 @@ 
+
+  .allow_index_reg
+  .text
+_start:
+  ldtilecfg  (%rcx,%rdx,2)
+  sttilecfg  (%rcx,%rdx,2)
+  tdpbf16ps %tmm5, %tmm4, %tmm3
+  tdpbssd %tmm3, %tmm2, %tmm1
+  tdpbsud %tmm3, %tmm2, %tmm1
+  tdpbusd %tmm3, %tmm2, %tmm1
+  tdpbuud %tmm3, %tmm2, %tmm1
+  tileloadd foo, %tmm5
+  tileloadd (%rcx), %tmm5
+  tileloadd (%ecx), %tmm5
+  tileloadd (%rcx,%rdx,1), %tmm5
+  tileloadd (%ecx,%edx,2), %tmm1
+  tileloaddt1 foo, %tmm5
+  tileloaddt1 (%rcx), %tmm5
+  tileloaddt1 (%ecx), %tmm5
+  tileloaddt1 (%rcx,%rdx,1), %tmm5
+  tileloaddt1 (%ecx,%edx,2), %tmm1
+  tileloaddt1 (%rcx,%riz,2), %tmm1
+  tilerelease
+  tilestored %tmm5, (%rcx)
+  tilestored %tmm5, (%ecx)
+  tilestored %tmm5, (%rcx,%rdx,1)
+  tilestored %tmm1, (%ecx,%edx,2)
+  tilezero %tmm0
+  tilezero %tmm5
+  tilezero %tmm7
+
+
+  .intel_syntax noprefix
+  ldtilecfg  [rcx]
+  ldtilecfg  [rbx]
+  sttilecfg  [rcx]
+  sttilecfg  [rbx]
+  tdpbf16ps tmm3, tmm4, tmm5
+  tdpbssd tmm1, tmm2, tmm3
+  tdpbsud tmm1, tmm2, tmm3
+  tdpbusd tmm1, tmm2, tmm3
+  tdpbuud tmm1, tmm2, tmm3
+  tileloadd tmm5, foo
+  tileloadd tmm5, [rcx]
+  tileloadd tmm5, [ecx]
+  tileloadd tmm5, [rcx+rdx]
+  tileloadd tmm1, [ecx+edx*2]
+  tileloaddt1 tmm5, foo
+  tileloaddt1 tmm5, [rcx]
+  tileloaddt1 tmm5, [ecx]
+  tileloaddt1 tmm5, [rcx+rdx]
+  tileloaddt1 tmm1, [ecx+edx*2]
+  tileloaddt1 tmm1, [rcx+riz*2]
+  tilerelease
+  tilestored [rcx], tmm5
+  tilestored [ecx], tmm5
+  tilestored [rcx+rdx], tmm5
+  tilestored [ecx+edx*2], tmm1
+  tilezero tmm0
+  tilezero tmm5
+  tilezero tmm7
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index e1ebb48553..e443918499 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -244,6 +244,7 @@  fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Bad_Opcode NULL, { { NULL, 0 } }, 0
 
 #define Eb { OP_E, b_mode }
+#define EV { OP_E, void_mode }
 #define Ebnd { OP_E, bnd_mode }
 #define EbS { OP_E, b_swap_mode }
 #define EbndS { OP_E, bnd_swap_mode }
@@ -374,6 +375,7 @@  fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define XMScalar { OP_XMM, scalar_mode }
 #define XMGatherQ { OP_XMM, vex_vsib_q_w_dq_mode }
 #define XMM { OP_XMM, xmm_mode }
+#define XMT { OP_XMM, tmm_mode }
 #define XMxmmq { OP_XMM, xmmq_mode }
 #define EM { OP_EM, v_mode }
 #define EMS { OP_EM, v_swap_mode }
@@ -393,6 +395,7 @@  fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define EXxS { OP_EX, x_swap_mode }
 #define EXxmm { OP_EX, xmm_mode }
 #define EXymm { OP_EX, ymm_mode }
+#define EXtmm { OP_EX, tmm_mode }
 #define EXxmmq { OP_EX, xmmq_mode }
 #define EXEvexHalfBcstXmmq { OP_EX, evex_half_bcst_xmmq_mode }
 #define EXxmm_mb { OP_EX, xmm_mb_mode }
@@ -423,6 +426,7 @@  fetch_data (struct disassemble_info *info, bfd_byte *addr)
 #define Vex128 { OP_VEX, vex128_mode }
 #define Vex256 { OP_VEX, vex256_mode }
 #define VexGdq { OP_VEX, dq_mode }
+#define Vextmm { OP_VEX, tmm_mode }
 #define EXdVexScalarS { OP_EX_Vex, d_scalar_swap_mode }
 #define EXqVexScalarS { OP_EX_Vex, q_scalar_swap_mode }
 #define EXVexW { OP_EX_VexW, x_mode }
@@ -544,8 +548,12 @@  enum
   ymmq_mode,
   /* 32-byte YMM or 16-byte word operand */
   ymmxmm_mode,
+  /* TMM operand */
+  tmm_mode,
   /* d_mode in 32bit, q_mode in 64bit mode.  */
   m_mode,
+  /* A generic memory operand.  */
+  void_mode,
   /* pair of v_mode operands */
   a_mode,
   cond_jump_mode,
@@ -749,6 +757,7 @@  enum
   REG_VEX_0F72,
   REG_VEX_0F73,
   REG_VEX_0FAE,
+  REG_VEX_W_0_0F3849_P_0_M_3,
   REG_VEX_0F38F3,
   REG_XOP_LWPCB,
   REG_XOP_LWP,
@@ -832,6 +841,17 @@  enum
   MOD_0FE7_PREFIX_2,
   MOD_0FF0_PREFIX_3,
   MOD_0F382A_PREFIX_2,
+  MOD_VEX_W_0_0F3849_P_0,
+  MOD_VEX_W_0_0F3849_P_2,
+  MOD_VEX_W_0_0F3849_P_3,
+  MOD_VEX_W_0_0F384B_P_1,
+  MOD_VEX_W_0_0F384B_P_2,
+  MOD_VEX_W_0_0F384B_P_3,
+  MOD_VEX_W_0_0F385C_P_1,
+  MOD_VEX_W_0_0F385E_P_0,
+  MOD_VEX_W_0_0F385E_P_1,
+  MOD_VEX_W_0_0F385E_P_2,
+  MOD_VEX_W_0_0F385E_P_3,
   MOD_0F38F5_PREFIX_2,
   MOD_0F38F6_PREFIX_0,
   MOD_0F38F8_PREFIX_1,
@@ -961,6 +981,7 @@  enum
   RM_0F1E_P_1_MOD_3_REG_7,
   RM_0FAE_REG_6_MOD_3_P_0,
   RM_0FAE_REG_7_MOD_3,
+  RM_VEX_W_0_0F3849_P_0_M_3_R_0
 };
 
 enum
@@ -1296,9 +1317,13 @@  enum
   PREFIX_VEX_0F3845,
   PREFIX_VEX_0F3846,
   PREFIX_VEX_0F3847,
+  PREFIX_VEX_0F3849,
+  PREFIX_VEX_0F384B,
   PREFIX_VEX_0F3858,
   PREFIX_VEX_0F3859,
   PREFIX_VEX_0F385A,
+  PREFIX_VEX_0F385C,
+  PREFIX_VEX_0F385E,
   PREFIX_VEX_0F3878,
   PREFIX_VEX_0F3879,
   PREFIX_VEX_0F388C,
@@ -1767,7 +1792,19 @@  enum
   X86_64_0F01_REG_0,
   X86_64_0F01_REG_1,
   X86_64_0F01_REG_2,
-  X86_64_0F01_REG_3
+  X86_64_0F01_REG_3,
+  X86_64_VEX_W_0_0F3849_P_0_M_0,
+  X86_64_0F3849_MOD_3_REG_0_RM_0,
+  X86_64_VEX_W_0_0F3849_P_2_M_0,
+  X86_64_VEX_W_0_0F3849_P_3_M_0,
+  X86_64_MOD_VEX_W_0_0F384B_P_1,
+  X86_64_MOD_VEX_W_0_0F384B_P_2,
+  X86_64_MOD_VEX_W_0_0F384B_P_3,
+  X86_64_MOD_VEX_W_0_0F385C_P_1,
+  X86_64_MOD_VEX_W_0_0F385E_P_0,
+  X86_64_MOD_VEX_W_0_0F385E_P_1,
+  X86_64_MOD_VEX_W_0_0F385E_P_2,
+  X86_64_MOD_VEX_W_0_0F385E_P_3
 };
 
 enum
@@ -2006,9 +2043,20 @@  enum
   VEX_W_0F382F_P_2_M_0,
   VEX_W_0F3836_P_2,
   VEX_W_0F3846_P_2,
+  VEX_W_0F3849_P_0,
+  VEX_W_0F3849_P_2,
+  VEX_W_0F3849_P_3,
+  VEX_W_0F384B_P_1,
+  VEX_W_0F384B_P_2,
+  VEX_W_0F384B_P_3,
   VEX_W_0F3858_P_2,
   VEX_W_0F3859_P_2,
   VEX_W_0F385A_P_2_M_0,
+  VEX_W_0F385C_P_1,
+  VEX_W_0F385E_P_0,
+  VEX_W_0F385E_P_1,
+  VEX_W_0F385E_P_2,
+  VEX_W_0F385E_P_3,
   VEX_W_0F3878_P_2,
   VEX_W_0F3879_P_2,
   VEX_W_0F38CF_P_2,
@@ -3153,6 +3201,16 @@  static const char *att_names_zmm[] = {
   "%zmm28", "%zmm29", "%zmm30", "%zmm31"
 };
 
+static const char **names_tmm;
+static const char *intel_names_tmm[] = {
+  "tmm0", "tmm1", "tmm2", "tmm3",
+  "tmm4", "tmm5", "tmm6", "tmm7"
+};
+static const char *att_names_tmm[] = {
+  "%tmm0", "%tmm1", "%tmm2", "%tmm3",
+  "%tmm4", "%tmm5", "%tmm6", "%tmm7"
+};
+
 static const char **names_mask;
 static const char *intel_names_mask[] = {
   "k0", "k1", "k2", "k3", "k4", "k5", "k6", "k7"
@@ -3521,6 +3579,10 @@  static const struct dis386 reg_table[][8] = {
     { MOD_TABLE (MOD_VEX_0FAE_REG_2) },
     { MOD_TABLE (MOD_VEX_0FAE_REG_3) },
   },
+  /* REG_VEX_W_0_0F3849_P_0_M_3 */
+  {
+    { RM_TABLE (RM_VEX_W_0_0F3849_P_0_M_3_R_0) },
+  },
   /* REG_VEX_0F38F3 */
   {
     { Bad_Opcode },
@@ -5902,6 +5964,22 @@  static const struct dis386 prefix_table[][4] = {
     { "vpsllv%LW", { XM, Vex, EXx }, 0 },
   },
 
+  /* PREFIX_VEX_0F3849 */
+  {
+    { VEX_W_TABLE (VEX_W_0F3849_P_0) },
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F3849_P_2) },
+    { VEX_W_TABLE (VEX_W_0F3849_P_3) },
+  },
+
+  /* PREFIX_VEX_0F384B */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F384B_P_1) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_2) },
+    { VEX_W_TABLE (VEX_W_0F384B_P_3) },
+  },
+
   /* PREFIX_VEX_0F3858 */
   {
     { Bad_Opcode },
@@ -5923,6 +6001,21 @@  static const struct dis386 prefix_table[][4] = {
     { MOD_TABLE (MOD_VEX_0F385A_PREFIX_2) },
   },
 
+  /* PREFIX_VEX_0F385C */
+  {
+    { Bad_Opcode },
+    { VEX_W_TABLE (VEX_W_0F385C_P_1) },
+    { Bad_Opcode },
+  },
+
+  /* PREFIX_VEX_0F385E */
+  {
+    { VEX_W_TABLE (VEX_W_0F385E_P_0) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_1) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_2) },
+    { VEX_W_TABLE (VEX_W_0F385E_P_3) },
+  },
+
   /* PREFIX_VEX_0F3878 */
   {
     { Bad_Opcode },
@@ -6938,6 +7031,78 @@  static const struct dis386 x86_64_table[][2] = {
     { "lidt{Q|Q}", { M }, 0 },
     { "lidt", { M }, 0 },
   },
+
+  /* X86_64_VEX_W_0_0F3849_P_0_M_0 */
+  {
+    { Bad_Opcode },
+    { "ldtilecfg", { EV }, 0 },
+  },
+
+  /* X86_64_0F3849_MOD_3_REG_0_RM_0 */
+  {
+    { Bad_Opcode },
+    { "tilerelease", { Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_VEX_W_0_0F3849_P_2_M_0 */
+  {
+    { Bad_Opcode },
+    { "sttilecfg", { EV }, 0 },
+  },
+
+  /* X86_64_VEX_W_0_0F3849_P_3_M_0 */
+  {
+    { Bad_Opcode },
+    { "tilezero", { XMT, Skip_MODRM }, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F384B_P_1 */
+  {
+    { Bad_Opcode },
+    { "tilestored", { EV, XMT }, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F384B_P_2 */
+  {
+    { Bad_Opcode },
+    { "tileloaddt1", { XMT, EV }, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F384B_P_3 */
+  {
+    { Bad_Opcode },
+    { "tileloadd", { XMT, EV }, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F385C_P_1 */
+  {
+    { Bad_Opcode },
+    { "tdpbf16ps", { XMT, EXtmm, Vextmm }, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F385E_P_0 */
+  {
+    { Bad_Opcode },
+    { "tdpbuud", {XMT, EXtmm, Vextmm}, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F385E_P_1 */
+  {
+    { Bad_Opcode },
+    { "tdpbsud", {XMT, EXtmm, Vextmm}, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F385E_P_2 */
+  {
+    { Bad_Opcode },
+    { "tdpbusd", {XMT, EXtmm, Vextmm}, 0 },
+  },
+
+  /* X86_64_MOD_VEX_W_0_0F385E_P_3 */
+  {
+    { Bad_Opcode },
+    { "tdpbssd", {XMT, EXtmm, Vextmm}, 0 },
+  },
 };
 
 static const struct dis386 three_byte_table[][256] = {
@@ -8779,9 +8944,9 @@  static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3847) },
     /* 48 */
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F3849) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F384B) },
     { Bad_Opcode },
     { Bad_Opcode },
     { Bad_Opcode },
@@ -8800,9 +8965,9 @@  static const struct dis386 vex_table[][256] = {
     { PREFIX_TABLE (PREFIX_VEX_0F3859) },
     { PREFIX_TABLE (PREFIX_VEX_0F385A) },
     { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385C) },
     { Bad_Opcode },
-    { Bad_Opcode },
-    { Bad_Opcode },
+    { PREFIX_TABLE (PREFIX_VEX_0F385E) },
     { Bad_Opcode },
     /* 60 */
     { Bad_Opcode },
@@ -10036,6 +10201,30 @@  static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F3846_P_2 */
     { "vpsravd",	{ XM, Vex, EXx }, 0 },
   },
+  {
+    /* VEX_W_0F3849_P_0 */
+    { MOD_TABLE (MOD_VEX_W_0_0F3849_P_0) },
+  },
+  {
+    /* VEX_W_0F3849_P_2 */
+    { MOD_TABLE (MOD_VEX_W_0_0F3849_P_2) },
+  },
+  {
+    /* VEX_W_0F3849_P_3 */
+    { MOD_TABLE (MOD_VEX_W_0_0F3849_P_3) },
+  },
+  {
+    /* VEX_W_0F384B_P_1 */
+    { MOD_TABLE (MOD_VEX_W_0_0F384B_P_1) },
+  },
+  {
+    /* VEX_W_0F384B_P_2 */
+    { MOD_TABLE (MOD_VEX_W_0_0F384B_P_2) },
+  },
+  {
+    /* VEX_W_0F384B_P_3 */
+    { MOD_TABLE (MOD_VEX_W_0_0F384B_P_3) },
+  },
   {
     /* VEX_W_0F3858_P_2 */
     { "vpbroadcastd", { XM, EXxmm_md }, 0 },
@@ -10048,6 +10237,26 @@  static const struct dis386 vex_w_table[][2] = {
     /* VEX_W_0F385A_P_2_M_0 */
     { "vbroadcasti128", { XM, Mxmm }, 0 },
   },
+  {
+    /* VEX_W_0F385C_P_1 */
+    { MOD_TABLE (MOD_VEX_W_0_0F385C_P_1) },
+  },
+  {
+    /* VEX_W_0F385E_P_0 */
+    { MOD_TABLE (MOD_VEX_W_0_0F385E_P_0) },
+  },
+  {
+    /* VEX_W_0F385E_P_1 */
+    { MOD_TABLE (MOD_VEX_W_0_0F385E_P_1) },
+  },
+  {
+    /* VEX_W_0F385E_P_2 */
+    { MOD_TABLE (MOD_VEX_W_0_0F385E_P_2) },
+  },
+  {
+    /* VEX_W_0F385E_P_3 */
+    { MOD_TABLE (MOD_VEX_W_0_0F385E_P_3) },
+  },
   {
     /* VEX_W_0F3878_P_2 */
     { "vpbroadcastb",	{ XM, EXxmm_mb }, 0 },
@@ -10474,6 +10683,57 @@  static const struct dis386 mod_table[][2] = {
     /* MOD_0F382A_PREFIX_2 */
     { "movntdqa",	{ XM, Mx }, 0 },
   },
+  {
+    /* MOD_VEX_W_0_0F3849_P_0 */
+    { X86_64_TABLE (X86_64_VEX_W_0_0F3849_P_0_M_0) },
+    { REG_TABLE (REG_VEX_W_0_0F3849_P_0_M_3) },
+  },
+  {
+    /* MOD_VEX_W_0_0F3849_P_2 */
+    { X86_64_TABLE (X86_64_VEX_W_0_0F3849_P_2_M_0) },
+  },
+  {
+    /* MOD_VEX_W_0_0F3849_P_3 */
+    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_VEX_W_0_0F3849_P_3_M_0) },
+  },
+  {
+    /* MOD_VEX_W_0_0F384B_P_1 */
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F384B_P_1) },
+  },
+  {
+    /* MOD_VEX_W_0_0F384B_P_2 */
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F384B_P_2) },
+  },
+  {
+    /* MOD_VEX_W_0_0F384B_P_3 */
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F384B_P_3) },
+  },
+  {
+    /* MOD_VEX_W_0_0F385C_P_1 */
+    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F385C_P_1) },
+  },
+  {
+    /* MOD_VEX_W_0_0F385E_P_0 */
+    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F385E_P_0) },
+  },
+  {
+    /* MOD_VEX_W_0_0F385E_P_1 */
+    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F385E_P_1) },
+  },
+  {
+    /* MOD_VEX_W_0_0F385E_P_2 */
+    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F385E_P_2) },
+  },
+  {
+    /* MOD_VEX_W_0_0F385E_P_3 */
+    { Bad_Opcode },
+    { X86_64_TABLE (X86_64_MOD_VEX_W_0_0F385E_P_3) },
+  },
   {
     /* MOD_0F38F5_PREFIX_2 */
     { "wrussK",		{ M, Gdq }, PREFIX_OPCODE },
@@ -11035,6 +11295,10 @@  static const struct dis386 rm_table[][8] = {
     { "sfence",		{ Skip_MODRM }, 0 },
 
   },
+  {
+    /* RM_VEX_W_0_0F3849_P_0_M_3_R_0 */
+    { X86_64_TABLE (X86_64_0F3849_MOD_3_REG_0_RM_0) },
+  },
 };
 
 #define INTERNAL_DISASSEMBLER_ERROR _("<internal disassembler error>")
@@ -11926,6 +12190,7 @@  print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = intel_names_xmm;
       names_ymm = intel_names_ymm;
       names_zmm = intel_names_zmm;
+      names_tmm = intel_names_tmm;
       index64 = intel_index64;
       index32 = intel_index32;
       names_mask = intel_names_mask;
@@ -11948,6 +12213,7 @@  print_insn (bfd_vma pc, disassemble_info *info)
       names_xmm = att_names_xmm;
       names_ymm = att_names_ymm;
       names_zmm = att_names_zmm;
+      names_tmm = att_names_tmm;
       index64 = att_index64;
       index32 = att_index32;
       names_mask = att_names_mask;
@@ -13451,6 +13717,8 @@  intel_operand_size (int bytemode, int sizeflag)
     }
   switch (bytemode)
     {
+    case void_mode:
+      break;
     case b_mode:
     case b_swap_mode:
     case dqb_mode:
@@ -15172,6 +15440,7 @@  OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != scalar_mode)
     {
       switch (vex.length)
@@ -15210,6 +15479,8 @@  OP_XMM (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    names = names_tmm;
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15334,6 +15605,7 @@  OP_EX (int bytemode, int sizeflag)
       && bytemode != xmmq_mode
       && bytemode != evex_half_bcst_xmmq_mode
       && bytemode != ymm_mode
+      && bytemode != tmm_mode
       && bytemode != d_scalar_mode
       && bytemode != d_scalar_swap_mode
       && bytemode != q_scalar_mode
@@ -15371,6 +15643,8 @@  OP_EX (int bytemode, int sizeflag)
 	  abort ();
 	}
     }
+  else if (bytemode == tmm_mode)
+    names = names_tmm;
   else if (bytemode == ymm_mode)
     names = names_ymm;
   else
@@ -15926,6 +16200,12 @@  OP_VEX (int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       return;
     }
 
+  if (bytemode == tmm_mode)
+    {
+      oappend (names_tmm[reg]);
+      return;
+    }
+
   switch (vex.length)
     {
     case 128:
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index e7454db5d4..bc900bdb76 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -297,6 +297,12 @@  static initializer cpu_flag_init[] =
     "CpuWAITPKG" },
   { "CPU_CLDEMOTE_FLAGS",
     "CpuCLDEMOTE" },
+  { "CPU_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_MOVDIR64B_FLAGS",
@@ -383,6 +389,12 @@  static initializer cpu_flag_init[] =
     "CpuAVX512_BITALG" },
   { "CPU_ANY_AVX512_BF16_FLAGS",
     "CpuAVX512_BF16" },
+  { "CPU_ANY_AMX_INT8_FLAGS",
+    "CpuAMX_INT8" },
+  { "CPU_ANY_AMX_BF16_FLAGS",
+    "CpuAMX_BF16" },
+  { "CPU_ANY_AMX_TILE_FLAGS",
+    "CpuAMX_TILE" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -459,6 +471,8 @@  static initializer operand_type_init[] =
     "Class=RegSIMD|Ymmword" },
   { "OPERAND_TYPE_REGZMM",
     "Class=RegSIMD|Zmmword" },
+  { "OPERAND_TYPE_REGTMM",
+    "Class=RegSIMD|Tmmword" },
   { "OPERAND_TYPE_REGMASK",
     "Class=RegMask" },
   { "OPERAND_TYPE_REGBND",
@@ -611,6 +625,9 @@  static bitfield cpu_flags[] =
   BITFIELD (CpuPCONFIG),
   BITFIELD (CpuWAITPKG),
   BITFIELD (CpuCLDEMOTE),
+  BITFIELD (CpuAMX_INT8),
+  BITFIELD (CpuAMX_BF16),
+  BITFIELD (CpuAMX_TILE),
   BITFIELD (CpuMOVDIRI),
   BITFIELD (CpuMOVDIR64B),
   BITFIELD (CpuENQCMD),
@@ -740,6 +757,7 @@  static bitfield operand_types[] =
   BITFIELD (Xmmword),
   BITFIELD (Ymmword),
   BITFIELD (Zmmword),
+  BITFIELD (Tmmword),
   BITFIELD (Unspecified),
 #ifdef OTUnused
   BITFIELD (OTUnused),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 174438698e..fa0e64ab63 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -223,6 +223,12 @@  enum
   /* CET instructions support required */
   CpuIBT,
   CpuSHSTK,
+  /* AMX-INT8 instructions required */
+  CpuAMX_INT8,
+  /* AMX-BF16 instructions required */
+  CpuAMX_BF16,
+  /* AMX-TILE instructions required */
+  CpuAMX_TILE,
   /* GFNI instructions required */
   CpuGFNI,
   /* VAES instructions required */
@@ -372,6 +378,9 @@  typedef union i386_cpu_flags
       unsigned int cpuptwrite:1;
       unsigned int cpuibt:1;
       unsigned int cpushstk:1;
+      unsigned int cpuamx_int8:1;
+      unsigned int cpuamx_bf16:1;
+      unsigned int cpuamx_tile:1;
       unsigned int cpugfni:1;
       unsigned int cpuvaes:1;
       unsigned int cpuvpclmulqdq:1;
@@ -528,10 +537,12 @@  enum
      instructions with 1 destination register operand.
      3. VEX.LWP.  Register destination is encoded in VEX.vvvv and one
 	of the operands can access a memory location.
+     4. VEX.OP3.  Use VEX.vvvv to encode the third operand.
    */
 #define VEXXDS	1
 #define VEXNDD	2
 #define VEXLWP	3
+#define VEXOP3	4
   VexVVVV,
   /* How the VEX.W bit is used:
      0: Set by the REX.W bit.
@@ -574,7 +585,9 @@  enum
 #define VECSIB128	1
 #define VECSIB256	2
 #define VECSIB512	3
+#define SIBMEM		4
   SIB,
+
   /* SSE to AVX support required */
   SSE2AVX,
   /* No AVX equivalent */
@@ -695,11 +708,11 @@  typedef struct i386_opcode_modifier
   unsigned int norex64:1;
   unsigned int ugh:1;
   unsigned int vex:2;
-  unsigned int vexvvvv:2;
+  unsigned int vexvvvv:3;
   unsigned int vexw:2;
   unsigned int vexopcode:3;
   unsigned int vexsources:2;
-  unsigned int sib:2;
+  unsigned int sib:3;
   unsigned int sse2avx:1;
   unsigned int noavx:1;
   unsigned int evex:3;
@@ -803,6 +816,8 @@  enum
   Ymmword,
   /* ZMMWORD size.  */
   Zmmword,
+  /* TMMWORD size.  */
+  Tmmword,
   /* Unspecified memory size.  */
   Unspecified,
 
@@ -847,6 +862,7 @@  typedef union i386_operand_type
       unsigned int xmmword:1;
       unsigned int ymmword:1;
       unsigned int zmmword:1;
+      unsigned int tmmword:1;
       unsigned int unspecified:1;
 #ifdef OTUnused
       unsigned int unused:(OTNumOfBits - OTUnused);
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index ded96884c0..904a95e6e4 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -52,6 +52,7 @@ 
 #define RegXMM Class=RegSIMD|Xmmword
 #define RegYMM Class=RegSIMD|Ymmword
 #define RegZMM Class=RegSIMD|Zmmword
+#define RegTMM Class=RegSIMD|Tmmword
 
 #define RegMask Class=RegMask
 
@@ -80,6 +81,11 @@ 
 #define VexW0 VexW=VEXW0
 #define VexW1 VexW=VEXW1
 #define VexWIG VexW=VEXWIG
+#define VexOP3 VexVVVV=VEXOP3
+#define VexSIB128 SIB=VECSIB128
+#define VecSIB256 SIB=VECSIB256
+#define VecSIB512 SIB=VECSIB512
+#define Sibmem SIB=SIBMEM
 
 #define Vex128 Vex=VEX128
 #define Vex256 Vex=VEX256
@@ -4093,3 +4099,25 @@  xsusldtrk, 0, 0xf20f01e8, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|
 xresldtrk, 0, 0xf20f01e9, None, 3, CpuTSXLDTRK, No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 0 }
 
 // TSXLDTRK instructions end.
+
+// AMX instructions.
+
+ldtilecfg, 1, 0x49, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+sttilecfg, 1, 0x6649, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex }
+
+// Use VexOP3 to indicate we are going to use Vex.vvvv field to encode the third operand.
+tdpbf16ps, 3, 0xf35c, None, 1, CpuAMX_BF16|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbssd, 3, 0xf25e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbuud, 3, 0x5e,   None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbusd, 3, 0x665e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+tdpbsud, 3, 0xf35e, None, 1, CpuAMX_INT8|Cpu64, Modrm|Vex|VexOpcode=1|VexOP3|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, RegTMM, RegTMM }
+
+tileloadd, 2, 0xf24B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tileloaddt1, 2, 0x664B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Unspecified|BaseIndex, RegTMM }
+tilestored, 2, 0xf34B, None, 1, CpuAMX_TILE|Cpu64, Modrm|Sibmem|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM, Unspecified|BaseIndex }
+
+tilerelease, 0, 0x49, 0xc0, 1, CpuAMX_TILE|Cpu64, Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ImmExt, { 0 }
+
+tilezero, 1, 0xf249, None, 1, CpuAMX_TILE|Cpu64, Modrm|Vex|VexOpcode=1|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegTMM }
+
+// AMX instructions end.
diff --git a/opcodes/i386-reg.tbl b/opcodes/i386-reg.tbl
index cdff763ca7..ca7eeba488 100644
--- a/opcodes/i386-reg.tbl
+++ b/opcodes/i386-reg.tbl
@@ -278,6 +278,15 @@  zmm28, Class=RegSIMD|Zmmword, RegVRex|RegRex, 4, Dw2Inval, Dw2Inval
 zmm29, Class=RegSIMD|Zmmword, RegVRex|RegRex, 5, Dw2Inval, Dw2Inval
 zmm30, Class=RegSIMD|Zmmword, RegVRex|RegRex, 6, Dw2Inval, Dw2Inval
 zmm31, Class=RegSIMD|Zmmword, RegVRex|RegRex, 7, Dw2Inval, Dw2Inval
+// TMM registers for AMX
+tmm0, Class=RegSIMD|Tmmword, 0, 0, Dw2Inval, Dw2Inval
+tmm1, Class=RegSIMD|Tmmword, 0, 1, Dw2Inval, Dw2Inval
+tmm2, Class=RegSIMD|Tmmword, 0, 2, Dw2Inval, Dw2Inval
+tmm3, Class=RegSIMD|Tmmword, 0, 3, Dw2Inval, Dw2Inval
+tmm4, Class=RegSIMD|Tmmword, 0, 4, Dw2Inval, Dw2Inval
+tmm5, Class=RegSIMD|Tmmword, 0, 5, Dw2Inval, Dw2Inval
+tmm6, Class=RegSIMD|Tmmword, 0, 6, Dw2Inval, Dw2Inval
+tmm7, Class=RegSIMD|Tmmword, 0, 7, Dw2Inval, Dw2Inval
 // Bound registers for MPX
 bnd0, Class=RegBND, 0, 0, Dw2Inval, Dw2Inval
 bnd1, Class=RegBND, 0, 1, Dw2Inval, Dw2Inval