x86: Support Intel AVX512 BF16

Message ID 20190405180128.GA12026@intel.com
State New
Headers show
Series
  • x86: Support Intel AVX512 BF16
Related show

Commit Message

H.J. Lu April 5, 2019, 6:01 p.m.
Add assembler and disassembler support Intel AVX512 BF16:

https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

gas/

2019-04-05  Xuepeng Guo  <xuepeng.guo@intel.com>

	* config/tc-i386.c (cpu_arch): Add .avx512_bf16.
	(cpu_noarch): Add noavx512_bf16.
	* doc/c-i386.texi: Document avx512_bf16.
	* testsuite/gas/i386/avx512_bf16.d: New file.
	* testsuite/gas/i386/avx512_bf16.s: Likewise.
	* testsuite/gas/i386/avx512_bf16_vl-inval.l: Likewise.
	* testsuite/gas/i386/avx512_bf16_vl-inval.s: Likewise.
	* testsuite/gas/i386/avx512_bf16_vl.d: Likewise.
	* testsuite/gas/i386/avx512_bf16_vl.s: Likewise.
	* testsuite/gas/i386/x86-64-avx512_bf16.d: Likewise.
	* testsuite/gas/i386/x86-64-avx512_bf16.s: Likewise.
	* testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.l: Likesie.
	* testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.s: Likewise.
	* testsuite/gas/i386/x86-64-avx512_bf16_vl.d: Likewise.
	* testsuite/gas/i386/x86-64-avx512_bf16_vl.s: Likewise.
	* testsuite/gas/i386/i386.exp: Add BF16 related tests.

opcodes/

2019-04-05  Xuepeng Guo  <xuepeng.guo@intel.com>

	* i386-dis-evex.h (evex_table): Updated to support BF16
	instructions.
	* i386-dis.c (enum): Add EVEX_W_0F3852_P_1, EVEX_W_0F3872_P_1
	and EVEX_W_0F3872_P_3.
	* i386-gen.c (cpu_flag_init): Add CPU_AVX512_BF16_FLAGS.
	(cpu_flags): Add bitfield for CpuAVX512_BF16.
	* i386-opc.h (enum): Add CpuAVX512_BF16.
	(i386_cpu_flags): Add bitfield for cpuavx512_bf16.
	* i386-opc.tbl: Add AVX512 BF16 instructions.
	* i386-init.h: Regenerated.
	* i386-tbl.h: Likewise.
---
 gas/config/tc-i386.c                          |    3 +
 gas/doc/c-i386.texi                           |    4 +-
 gas/testsuite/gas/i386/avx512_bf16.d          |   42 +
 gas/testsuite/gas/i386/avx512_bf16.s          |   37 +
 gas/testsuite/gas/i386/avx512_bf16_vl-inval.l |    7 +
 gas/testsuite/gas/i386/avx512_bf16_vl-inval.s |   13 +
 gas/testsuite/gas/i386/avx512_bf16_vl.d       |   70 +
 gas/testsuite/gas/i386/avx512_bf16_vl.s       |   65 +
 gas/testsuite/gas/i386/i386.exp               |    6 +
 gas/testsuite/gas/i386/x86-64-avx512_bf16.d   |   42 +
 gas/testsuite/gas/i386/x86-64-avx512_bf16.s   |   37 +
 .../gas/i386/x86-64-avx512_bf16_vl-inval.l    |    7 +
 .../gas/i386/x86-64-avx512_bf16_vl-inval.s    |   13 +
 .../gas/i386/x86-64-avx512_bf16_vl.d          |   70 +
 .../gas/i386/x86-64-avx512_bf16_vl.s          |   65 +
 opcodes/i386-dis-evex.h                       |   20 +-
 opcodes/i386-dis.c                            |    3 +
 opcodes/i386-gen.c                            |    7 +-
 opcodes/i386-init.h                           |  378 +-
 opcodes/i386-opc.h                            |    3 +
 opcodes/i386-opc.tbl                          |   30 +
 opcodes/i386-tbl.h                            | 8254 +++++++++--------
 22 files changed, 5042 insertions(+), 4134 deletions(-)
 create mode 100644 gas/testsuite/gas/i386/avx512_bf16.d
 create mode 100644 gas/testsuite/gas/i386/avx512_bf16.s
 create mode 100644 gas/testsuite/gas/i386/avx512_bf16_vl-inval.l
 create mode 100644 gas/testsuite/gas/i386/avx512_bf16_vl-inval.s
 create mode 100644 gas/testsuite/gas/i386/avx512_bf16_vl.d
 create mode 100644 gas/testsuite/gas/i386/avx512_bf16_vl.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx512_bf16.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx512_bf16.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.l
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.s
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.d
 create mode 100644 gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.s

Comments

Jan Beulich April 8, 2019, 8:56 a.m. | #1
>>> On 05.04.19 at 20:01, <hongjiu.lu@intel.com> wrote:

> --- a/opcodes/i386-opc.tbl

> +++ b/opcodes/i386-opc.tbl

> @@ -4710,3 +4710,33 @@ movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|CpuNo64, Modrm|IgnoreSize|No_bSu

>  movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|AddrPrefixOpReg, { Unspecified|BaseIndex, Reg32|Reg64 }

>  

>  // MOVEDIR instructions end.

> +

> +// AVX512_BF16 instructions.

> +

> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }

> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }

> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex=1|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }


Here and below, can you please avoid introducing anew inefficient
table entries with partly irrelevant / redundant attributes:
- use Disp8ShiftVL (and no explicit EVex<NNN> / EVex=<N> nor
  CpuAVX512VL), thus folding the above three entries into one
- don't explicitly use [XYZ]mmWord (redundant with Reg[XYZ]MM,
  once folding with the non-broadcast forms is also done - see
  further down)
- presumably IgnoreSize is not needed
I hope I didn't forget further ones.

For brevity / readability I'd also like to recommend omitting"=1" in
insn attribute specifications. i386-gen assumes 1 whether there's
no explicit value given.

> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }

> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }

> +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }


Why this second set of three entries? Broadcast is an optional
attribute, i.e. the first entries (really: entry) ought to cover
everything.

> +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }

> +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }

> +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }

> +// AVX512_BF16 instructions end.


Can we gain a blank line above this sentinel please?

Thanks, Jan
Jan Beulich April 8, 2019, 9:10 a.m. | #2
>>> On 05.04.19 at 20:01, <hongjiu.lu@intel.com> wrote:

> Add assembler and disassembler support Intel AVX512 BF16:

> 

> https://software.intel.com/en-us/download/intel-architecture-instruction-set-extensions-programming-reference

> 

> gas/

> 

> 2019-04-05  Xuepeng Guo  <xuepeng.guo@intel.com>


And btw - having used this email in address in my earlier reply
I got back:

The message that you sent was undeliverable to the following: 

	xuepeng.guo@intel.com (550 #5.1.0 Address rejected.)

Jan
H.J. Lu April 8, 2019, 6:17 p.m. | #3
On Mon, Apr 8, 2019 at 1:56 AM Jan Beulich <JBeulich@suse.com> wrote:
>

> >>> On 05.04.19 at 20:01, <hongjiu.lu@intel.com> wrote:

> > --- a/opcodes/i386-opc.tbl

> > +++ b/opcodes/i386-opc.tbl

> > @@ -4710,3 +4710,33 @@ movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|CpuNo64, Modrm|IgnoreSize|No_bSu

> >  movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|AddrPrefixOpReg, { Unspecified|BaseIndex, Reg32|Reg64 }

> >

> >  // MOVEDIR instructions end.

> > +

> > +// AVX512_BF16 instructions.

> > +

> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }

> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }

> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex=1|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }

>

> Here and below, can you please avoid introducing anew inefficient

> table entries with partly irrelevant / redundant attributes:

> - use Disp8ShiftVL (and no explicit EVex<NNN> / EVex=<N> nor

>   CpuAVX512VL), thus folding the above three entries into one

> - don't explicitly use [XYZ]mmWord (redundant with Reg[XYZ]MM,

>   once folding with the non-broadcast forms is also done - see

>   further down)

> - presumably IgnoreSize is not needed

> I hope I didn't forget further ones.


Done.

> For brevity / readability I'd also like to recommend omitting"=1" in

> insn attribute specifications. i386-gen assumes 1 whether there's

> no explicit value given.


Done.

> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }

> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }

> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }

>

> Why this second set of three entries? Broadcast is an optional

> attribute, i.e. the first entries (really: entry) ought to cover

> everything.


Done.

> > +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }

> > +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }

> > +vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }

> > +// AVX512_BF16 instructions end.

>

> Can we gain a blank line above this sentinel please?

>


Done.

I am checking in this patch to update BF16 which was implemented by
Xuepeng before he left Intel.

Thanks.

-- 
H.J.
From 6f2791d5de45a9490ba6844617feac038c8da8bd Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Mon, 8 Apr 2019 11:06:04 -0700
Subject: [PATCH] x86: Consolidate AVX512 BF16 entries in i386-opc.tbl

1. Use single entry for vcvtne2ps2bf16 and vdpbf16ps with Disp8ShiftVL.
2. Use 5 entries, instead of 8, for vcvtneps2bf16.

	* i386-opc.tbl: Consolidate AVX512 BF16 entries.
	* i386-init.h: Regenerated.
---
 opcodes/ChangeLog    |   5 +
 opcodes/i386-opc.tbl |  29 ++---
 opcodes/i386-tbl.h   | 282 ++++---------------------------------------
 3 files changed, 34 insertions(+), 282 deletions(-)

diff --git a/opcodes/ChangeLog b/opcodes/ChangeLog
index bf775b5e3e..27557ebb08 100644
--- a/opcodes/ChangeLog
+++ b/opcodes/ChangeLog
@@ -1,3 +1,8 @@
+2019-04-08  H.J. Lu  <hongjiu.lu@intel.com>
+
+	* i386-opc.tbl: Consolidate AVX512 BF16 entries.
+	* i386-init.h: Regenerated.
+
 2019-04-07  Alan Modra  <amodra@gmail.com>
 
 	* ppc-dis.c (print_insn_powerpc): Use a tiny state machine
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 56fe4ef356..11ee240708 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -4713,30 +4713,15 @@ movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|Cpu64, Modrm|No_bSuf|No_wSuf|No_
 
 // AVX512_BF16 instructions.
 
-vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }
-vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex=1|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode|VexVVVV|Masking=3|VexW0|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }
-vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex128|Masking=3|VexW0|Broadcast|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|Dword|BaseIndex, RegXMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex256|Masking=3|VexW0|Broadcast|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM|Dword|BaseIndex, RegXMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16, Modrm|VexOpcode|EVex512|Masking=3|VexW0|Broadcast|Disp8MemShift=6|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM|Dword|Unspecified|BaseIndex, RegYMM }
 
-vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|BaseIndex, RegXMM }
-vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|BaseIndex, RegXMM }
-vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegYMM }
+vcvtneps2bf16x, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex128|Masking=3|VexW0|Disp8MemShift=4|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Unspecified|BaseIndex, RegXMM }
+vcvtneps2bf16y, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode|EVex256|Masking=3|VexW0|Disp8MemShift=5|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Unspecified|BaseIndex, RegXMM }
 
-vcvtneps2bf16x, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|Masking=3|VexW0|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM }
-vcvtneps2bf16y, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|Masking=3|VexW0|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Dword|Ymmword|Unspecified|BaseIndex, RegXMM }
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode|VexVVVV|Masking=3|VexW0|Broadcast|Disp8ShiftVL|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM|RegYMM|RegZMM|Dword|Unspecified|BaseIndex, RegXMM|RegYMM|RegZMM, RegXMM|RegYMM|RegZMM }
 
-vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
-vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegXMM }
-vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegYMM }
-
-vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }
-vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }
-vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }
-
-vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
-vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }
-vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }
 // AVX512_BF16 instructions end.
Jan Beulich April 9, 2019, 9:21 a.m. | #4
>>> On 08.04.19 at 20:17, <hjl.tools@gmail.com> wrote:

> On Mon, Apr 8, 2019 at 1:56 AM Jan Beulich <JBeulich@suse.com> wrote:

>>

>> >>> On 05.04.19 at 20:01, <hongjiu.lu@intel.com> wrote:

>> > --- a/opcodes/i386-opc.tbl

>> > +++ b/opcodes/i386-opc.tbl

>> > @@ -4710,3 +4710,33 @@ movdir64b, 2, 0x660f38f8, None, 3, 

> CpuMOVDIR64B|CpuNo64, Modrm|IgnoreSize|No_bSu

>> >  movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|Cpu64, 

> Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|AddrPrefixOpRe

> g, { Unspecified|BaseIndex, Reg32|Reg64 }

>> >

>> >  // MOVEDIR instructions end.

>> > +

>> > +// AVX512_BF16 instructions.

>> > +

>> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, 

> Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4

> |IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 

> Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }

>> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, 

> Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5

> |IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 

> Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }

>> > +vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, 

> Modrm|VexOpcode=1|EVex=1|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|

> IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { 

> Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }

>>

>> Here and below, can you please avoid introducing anew inefficient

>> table entries with partly irrelevant / redundant attributes:

>> - use Disp8ShiftVL (and no explicit EVex<NNN> / EVex=<N> nor

>>   CpuAVX512VL), thus folding the above three entries into one

>> - don't explicitly use [XYZ]mmWord (redundant with Reg[XYZ]MM,

>>   once folding with the non-broadcast forms is also done - see

>>   further down)

>> - presumably IgnoreSize is not needed

>> I hope I didn't forget further ones.

> 

> Done.


Much better, thanks!

> I am checking in this patch to update BF16 which was implemented by

> Xuepeng before he left Intel.


Oh, I see.

Jan

Patch

diff --git a/gas/config/tc-i386.c b/gas/config/tc-i386.c
index 690fd23ff0..ed8cfe1ad3 100644
--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1080,6 +1080,8 @@  static const arch_entry cpu_arch[] =
     CPU_MOVDIRI_FLAGS, 0 },
   { STRING_COMMA_LEN (".movdir64b"), PROCESSOR_UNKNOWN,
     CPU_MOVDIR64B_FLAGS, 0 },
+  { STRING_COMMA_LEN (".avx512_bf16"), PROCESSOR_UNKNOWN,
+    CPU_AVX512_BF16_FLAGS, 0 },
 };
 
 static const noarch_entry cpu_noarch[] =
@@ -1119,6 +1121,7 @@  static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("noshstk"), CPU_ANY_SHSTK_FLAGS },
   { STRING_COMMA_LEN ("nomovdiri"), CPU_ANY_MOVDIRI_FLAGS },
   { STRING_COMMA_LEN ("nomovdir64b"), CPU_ANY_MOVDIR64B_FLAGS },
+  { STRING_COMMA_LEN ("noavx512_bf16"), CPU_ANY_AVX512_BF16_FLAGS },
 };
 
 #ifdef I386COFF
diff --git a/gas/doc/c-i386.texi b/gas/doc/c-i386.texi
index 4acd5ff616..86cde7907e 100644
--- a/gas/doc/c-i386.texi
+++ b/gas/doc/c-i386.texi
@@ -198,6 +198,7 @@  accept various extension mnemonics.  For example,
 @code{avx512_vbmi2},
 @code{avx512_vnni},
 @code{avx512_bitalg},
+@code{avx512_bf16},
 @code{noavx512f},
 @code{noavx512cd},
 @code{noavx512er},
@@ -213,6 +214,7 @@  accept various extension mnemonics.  For example,
 @code{noavx512_vbmi2},
 @code{noavx512_vnni},
 @code{noavx512_bitalg},
+@code{noavx512_bf16},
 @code{vmx},
 @code{vmfunc},
 @code{smx},
@@ -1303,7 +1305,7 @@  supported on the CPU specified.  The choices for @var{cpu_type} are:
 @item @samp{.avx512vl} @tab @samp{.avx512bw} @tab @samp{.avx512dq} @tab @samp{.avx512ifma}
 @item @samp{.avx512vbmi} @tab @samp{.avx512_4fmaps} @tab @samp{.avx512_4vnniw}
 @item @samp{.avx512_vpopcntdq} @tab @samp{.avx512_vbmi2} @tab @samp{.avx512_vnni}
-@item @samp{.avx512_bitalg}
+@item @samp{.avx512_bitalg} @tab @samp{.avx512_bf16}
 @item @samp{.clwb} @tab @samp{.rdpid} @tab @samp{.ptwrite} @tab @item @samp{.ibt}
 @item @samp{.wbnoinvd} @tab @samp{.pconfig} @tab @samp{.waitpkg} @tab @samp{.cldemote}
 @item @samp{.shstk} @tab @samp{.gfni} @tab @samp{.vaes} @tab @samp{.vpclmulqdq}
diff --git a/gas/testsuite/gas/i386/avx512_bf16.d b/gas/testsuite/gas/i386/avx512_bf16.d
new file mode 100644
index 0000000000..e988ce89a2
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx512_bf16.d
@@ -0,0 +1,42 @@ 
+#as:
+#objdump: -dw
+#name: i386 BF16 insns
+#source: avx512_bf16.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	62 f2 57 48 72 f4    	vcvtne2ps2bf16 %zmm4,%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 57 4f 72 b4 f4 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%esp,%esi,8\),%zmm5,%zmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 57 58 72 31    	vcvtne2ps2bf16 \(%ecx\)\{1to16\},%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 57 48 72 71 7f 	vcvtne2ps2bf16 0x1fc0\(%ecx\),%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 57 df 72 b2 00 e0 ff ff 	vcvtne2ps2bf16 -0x2000\(%edx\)\{1to16\},%zmm5,%zmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 7e 48 72 f5    	vcvtneps2bf16 %zmm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 7e 4f 72 b4 f4 00 00 00 10 	vcvtneps2bf16 0x10000000\(%esp,%esi,8\),%ymm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 7e 58 72 31    	vcvtneps2bf16 \(%ecx\)\{1to16\},%ymm6
+[ 	]*[a-f0-9]+:	62 f2 7e 48 72 71 7f 	vcvtneps2bf16 0x1fc0\(%ecx\),%ymm6
+[ 	]*[a-f0-9]+:	62 f2 7e df 72 b2 00 e0 ff ff 	vcvtneps2bf16 -0x2000\(%edx\)\{1to16\},%ymm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 56 48 52 f4    	vdpbf16ps %zmm4,%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 56 4f 52 b4 f4 00 00 00 10 	vdpbf16ps 0x10000000\(%esp,%esi,8\),%zmm5,%zmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 56 58 52 31    	vdpbf16ps \(%ecx\)\{1to16\},%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 56 48 52 71 7f 	vdpbf16ps 0x1fc0\(%ecx\),%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 56 df 52 b2 00 e0 ff ff 	vdpbf16ps -0x2000\(%edx\)\{1to16\},%zmm5,%zmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 57 48 72 f4    	vcvtne2ps2bf16 %zmm4,%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 57 4f 72 b4 f4 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%esp,%esi,8\),%zmm5,%zmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 57 58 72 31    	vcvtne2ps2bf16 \(%ecx\)\{1to16\},%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 57 48 72 71 7f 	vcvtne2ps2bf16 0x1fc0\(%ecx\),%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 57 df 72 b2 00 e0 ff ff 	vcvtne2ps2bf16 -0x2000\(%edx\)\{1to16\},%zmm5,%zmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 7e 48 72 f5    	vcvtneps2bf16 %zmm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 7e 4f 72 b4 f4 00 00 00 10 	vcvtneps2bf16 0x10000000\(%esp,%esi,8\),%ymm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 7e 58 72 31    	vcvtneps2bf16 \(%ecx\)\{1to16\},%ymm6
+[ 	]*[a-f0-9]+:	62 f2 7e 48 72 71 7f 	vcvtneps2bf16 0x1fc0\(%ecx\),%ymm6
+[ 	]*[a-f0-9]+:	62 f2 7e df 72 b2 00 e0 ff ff 	vcvtneps2bf16 -0x2000\(%edx\)\{1to16\},%ymm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 56 48 52 f4    	vdpbf16ps %zmm4,%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 56 4f 52 b4 f4 00 00 00 10 	vdpbf16ps 0x10000000\(%esp,%esi,8\),%zmm5,%zmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 56 58 52 31    	vdpbf16ps \(%ecx\)\{1to16\},%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 56 48 52 71 7f 	vdpbf16ps 0x1fc0\(%ecx\),%zmm5,%zmm6
+[ 	]*[a-f0-9]+:	62 f2 56 df 52 b2 00 e0 ff ff 	vdpbf16ps -0x2000\(%edx\)\{1to16\},%zmm5,%zmm6\{%k7\}\{z\}
+#pass
diff --git a/gas/testsuite/gas/i386/avx512_bf16.s b/gas/testsuite/gas/i386/avx512_bf16.s
new file mode 100644
index 0000000000..74f60d2058
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx512_bf16.s
@@ -0,0 +1,37 @@ 
+# Check 32bit AVX512_BF16 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vcvtne2ps2bf16	%zmm4, %zmm5, %zmm6	 #AVX512_BF16
+	vcvtne2ps2bf16	0x10000000(%esp, %esi, 8), %zmm5, %zmm6{%k7}	 #AVX512_BF16 MASK_ENABLING
+	vcvtne2ps2bf16	(%ecx){1to16}, %zmm5, %zmm6	 #AVX512_BF16 BROADCAST_EN
+	vcvtne2ps2bf16	8128(%ecx), %zmm5, %zmm6	 #AVX512_BF16 Disp8
+	vcvtne2ps2bf16	-8192(%edx){1to16}, %zmm5, %zmm6{%k7}{z}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	%zmm5, %ymm6	 #AVX512_BF16
+	vcvtneps2bf16	0x10000000(%esp, %esi, 8), %ymm6{%k7}	 #AVX512_BF16 MASK_ENABLING
+	vcvtneps2bf16	(%ecx){1to16}, %ymm6	 #AVX512_BF16 BROADCAST_EN
+	vcvtneps2bf16	8128(%ecx), %ymm6	 #AVX512_BF16 Disp8
+	vcvtneps2bf16	-8192(%edx){1to16}, %ymm6{%k7}{z}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	%zmm4, %zmm5, %zmm6	 #AVX512_BF16
+	vdpbf16ps	0x10000000(%esp, %esi, 8), %zmm5, %zmm6{%k7}	 #AVX512_BF16 MASK_ENABLING
+	vdpbf16ps	(%ecx){1to16}, %zmm5, %zmm6	 #AVX512_BF16 BROADCAST_EN
+	vdpbf16ps	8128(%ecx), %zmm5, %zmm6	 #AVX512_BF16 Disp8
+	vdpbf16ps	-8192(%edx){1to16}, %zmm5, %zmm6{%k7}{z}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+
+.intel_syntax noprefix
+	vcvtne2ps2bf16	zmm6, zmm5, zmm4	 #AVX512_BF16
+	vcvtne2ps2bf16	zmm6{k7}, zmm5, ZMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512_BF16 MASK_ENABLING
+	vcvtne2ps2bf16	zmm6, zmm5, DWORD PTR [ecx]{1to16}	 #AVX512_BF16 BROADCAST_EN
+	vcvtne2ps2bf16	zmm6, zmm5, ZMMWORD PTR [ecx+8128]	 #AVX512_BF16 Disp8
+	vcvtne2ps2bf16	zmm6{k7}{z}, zmm5, DWORD PTR [edx-8192]{1to16}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	ymm6, zmm5	 #AVX512_BF16
+	vcvtneps2bf16	ymm6{k7}, ZMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512_BF16 MASK_ENABLING
+	vcvtneps2bf16	ymm6, DWORD PTR [ecx]{1to16}	 #AVX512_BF16 BROADCAST_EN
+	vcvtneps2bf16	ymm6, ZMMWORD PTR [ecx+8128]	 #AVX512_BF16 Disp8
+	vcvtneps2bf16	ymm6{k7}{z}, DWORD PTR [edx-8192]{1to16}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	zmm6, zmm5, zmm4	 #AVX512_BF16
+	vdpbf16ps	zmm6{k7}, zmm5, ZMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512_BF16 MASK_ENABLING
+	vdpbf16ps	zmm6, zmm5, DWORD PTR [ecx]{1to16}	 #AVX512_BF16 BROADCAST_EN
+	vdpbf16ps	zmm6, zmm5, ZMMWORD PTR [ecx+8128]	 #AVX512_BF16 Disp8
+	vdpbf16ps	zmm6{k7}{z}, zmm5, DWORD PTR [edx-8192]{1to16}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
diff --git a/gas/testsuite/gas/i386/avx512_bf16_vl-inval.l b/gas/testsuite/gas/i386/avx512_bf16_vl-inval.l
new file mode 100644
index 0000000000..dfd21d6692
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx512_bf16_vl-inval.l
@@ -0,0 +1,7 @@ 
+.*: Assembler messages:
+.*:6: Error: .*
+.*:7: Error: .*
+.*:8: Error: .*
+.*:11: Error: .*
+.*:12: Error: .*
+.*:13: Error: .*
diff --git a/gas/testsuite/gas/i386/avx512_bf16_vl-inval.s b/gas/testsuite/gas/i386/avx512_bf16_vl-inval.s
new file mode 100644
index 0000000000..e9e36b0064
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx512_bf16_vl-inval.s
@@ -0,0 +1,13 @@ 
+# Check illegal AVX512{BF16,VL} instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vcvtneps2bf16	0x10000000(%rbp, %r14, 8), %xmm3{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	2032(%rcx), %xmm3	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	4064(%rcx), %xmm3	 #AVX512{BF16,VL} Disp8
+
+.intel_syntax noprefix
+	vcvtneps2bf16	xmm3{k7}, [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	xmm3, [rcx+2032]	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	xmm3, [rcx+4064]	 #AVX512{BF16,VL} Disp8
diff --git a/gas/testsuite/gas/i386/avx512_bf16_vl.d b/gas/testsuite/gas/i386/avx512_bf16_vl.d
new file mode 100644
index 0000000000..1467bc3767
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx512_bf16_vl.d
@@ -0,0 +1,70 @@ 
+#as:
+#objdump: -dw
+#name: i386 BF16 VL insns
+#source: avx512_bf16_vl.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	62 f2 57 28 72 f4    	vcvtne2ps2bf16 %ymm4,%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 57 08 72 f4    	vcvtne2ps2bf16 %xmm4,%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 57 2f 72 b4 f4 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 57 38 72 31    	vcvtne2ps2bf16 \(%ecx\)\{1to8\},%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 57 28 72 71 7f 	vcvtne2ps2bf16 0xfe0\(%ecx\),%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 57 bf 72 b2 00 f0 ff ff 	vcvtne2ps2bf16 -0x1000\(%edx\)\{1to8\},%ymm5,%ymm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 57 0f 72 b4 f4 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 57 18 72 31    	vcvtne2ps2bf16 \(%ecx\)\{1to4\},%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 57 08 72 71 7f 	vcvtne2ps2bf16 0x7f0\(%ecx\),%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 57 9f 72 b2 00 f8 ff ff 	vcvtne2ps2bf16 -0x800\(%edx\)\{1to4\},%xmm5,%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 7e 08 72 f5    	vcvtneps2bf16 %xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 28 72 f5    	vcvtneps2bf16 %ymm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 0f 72 b4 f4 00 00 00 10 	vcvtneps2bf16x 0x10000000\(%esp,%esi,8\),%xmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 7e 18 72 31    	vcvtneps2bf16 \(%ecx\)\{1to4\},%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 08 72 71 7f 	vcvtneps2bf16x 0x7f0\(%ecx\),%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 9f 72 b2 00 f8 ff ff 	vcvtneps2bf16 -0x800\(%edx\)\{1to4\},%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 7e 38 72 31    	vcvtneps2bf16 \(%ecx\)\{1to8\},%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 28 72 71 7f 	vcvtneps2bf16y 0xfe0\(%ecx\),%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e bf 72 b2 00 f0 ff ff 	vcvtneps2bf16 -0x1000\(%edx\)\{1to8\},%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 56 28 52 f4    	vdpbf16ps %ymm4,%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 56 08 52 f4    	vdpbf16ps %xmm4,%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 56 2f 52 b4 f4 00 00 00 10 	vdpbf16ps 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 56 38 52 31    	vdpbf16ps \(%ecx\)\{1to8\},%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 56 28 52 71 7f 	vdpbf16ps 0xfe0\(%ecx\),%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 56 bf 52 b2 00 f0 ff ff 	vdpbf16ps -0x1000\(%edx\)\{1to8\},%ymm5,%ymm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 56 0f 52 b4 f4 00 00 00 10 	vdpbf16ps 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 56 18 52 31    	vdpbf16ps \(%ecx\)\{1to4\},%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 56 08 52 71 7f 	vdpbf16ps 0x7f0\(%ecx\),%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 56 9f 52 b2 00 f8 ff ff 	vdpbf16ps -0x800\(%edx\)\{1to4\},%xmm5,%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 57 28 72 f4    	vcvtne2ps2bf16 %ymm4,%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 57 08 72 f4    	vcvtne2ps2bf16 %xmm4,%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 57 2f 72 b4 f4 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 57 38 72 31    	vcvtne2ps2bf16 \(%ecx\)\{1to8\},%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 57 28 72 71 7f 	vcvtne2ps2bf16 0xfe0\(%ecx\),%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 57 bf 72 b2 00 f0 ff ff 	vcvtne2ps2bf16 -0x1000\(%edx\)\{1to8\},%ymm5,%ymm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 57 0f 72 b4 f4 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 57 18 72 31    	vcvtne2ps2bf16 \(%ecx\)\{1to4\},%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 57 08 72 71 7f 	vcvtne2ps2bf16 0x7f0\(%ecx\),%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 57 9f 72 b2 00 f8 ff ff 	vcvtne2ps2bf16 -0x800\(%edx\)\{1to4\},%xmm5,%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 7e 08 72 f5    	vcvtneps2bf16 %xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 28 72 f5    	vcvtneps2bf16 %ymm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 0f 72 b4 f4 00 00 00 10 	vcvtneps2bf16x 0x10000000\(%esp,%esi,8\),%xmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 7e 18 72 31    	vcvtneps2bf16 \(%ecx\)\{1to4\},%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 08 72 71 7f 	vcvtneps2bf16x 0x7f0\(%ecx\),%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 9f 72 b2 00 f8 ff ff 	vcvtneps2bf16 -0x800\(%edx\)\{1to4\},%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 7e 38 72 31    	vcvtneps2bf16 \(%ecx\)\{1to8\},%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e 28 72 71 7f 	vcvtneps2bf16y 0xfe0\(%ecx\),%xmm6
+[ 	]*[a-f0-9]+:	62 f2 7e bf 72 b2 00 f0 ff ff 	vcvtneps2bf16 -0x1000\(%edx\)\{1to8\},%xmm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 56 28 52 f4    	vdpbf16ps %ymm4,%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 56 08 52 f4    	vdpbf16ps %xmm4,%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 56 2f 52 b4 f4 00 00 00 10 	vdpbf16ps 0x10000000\(%esp,%esi,8\),%ymm5,%ymm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 56 38 52 31    	vdpbf16ps \(%ecx\)\{1to8\},%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 56 28 52 71 7f 	vdpbf16ps 0xfe0\(%ecx\),%ymm5,%ymm6
+[ 	]*[a-f0-9]+:	62 f2 56 bf 52 b2 00 f0 ff ff 	vdpbf16ps -0x1000\(%edx\)\{1to8\},%ymm5,%ymm6\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 f2 56 0f 52 b4 f4 00 00 00 10 	vdpbf16ps 0x10000000\(%esp,%esi,8\),%xmm5,%xmm6\{%k7\}
+[ 	]*[a-f0-9]+:	62 f2 56 18 52 31    	vdpbf16ps \(%ecx\)\{1to4\},%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 56 08 52 71 7f 	vdpbf16ps 0x7f0\(%ecx\),%xmm5,%xmm6
+[ 	]*[a-f0-9]+:	62 f2 56 9f 52 b2 00 f8 ff ff 	vdpbf16ps -0x800\(%edx\)\{1to4\},%xmm5,%xmm6\{%k7\}\{z\}
+#pass
diff --git a/gas/testsuite/gas/i386/avx512_bf16_vl.s b/gas/testsuite/gas/i386/avx512_bf16_vl.s
new file mode 100644
index 0000000000..7872765b77
--- /dev/null
+++ b/gas/testsuite/gas/i386/avx512_bf16_vl.s
@@ -0,0 +1,65 @@ 
+# Check 32bit AVX512{BF16,VL} instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vcvtne2ps2bf16	%ymm4, %ymm5, %ymm6	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	%xmm4, %xmm5, %xmm6	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	0x10000000(%esp, %esi, 8), %ymm5, %ymm6{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	(%ecx){1to8}, %ymm5, %ymm6	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	4064(%ecx), %ymm5, %ymm6	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	-4096(%edx){1to8}, %ymm5, %ymm6{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtne2ps2bf16	0x10000000(%esp, %esi, 8), %xmm5, %xmm6{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	(%ecx){1to4}, %xmm5, %xmm6	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	2032(%ecx), %xmm5, %xmm6	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	-2048(%edx){1to4}, %xmm5, %xmm6{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	%xmm5, %xmm6	 #AVX512{BF16,VL}
+	vcvtneps2bf16	%ymm5, %xmm6	 #AVX512{BF16,VL}
+	vcvtneps2bf16x	0x10000000(%esp, %esi, 8), %xmm6{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	(%ecx){1to4}, %xmm6	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16x	2032(%ecx), %xmm6	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	-2048(%edx){1to4}, %xmm6{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	(%ecx){1to8}, %xmm6	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16y	4064(%ecx), %xmm6	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	-4096(%edx){1to8}, %xmm6{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	%ymm4, %ymm5, %ymm6	 #AVX512{BF16,VL}
+	vdpbf16ps	%xmm4, %xmm5, %xmm6	 #AVX512{BF16,VL}
+	vdpbf16ps	0x10000000(%esp, %esi, 8), %ymm5, %ymm6{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	(%ecx){1to8}, %ymm5, %ymm6	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	4064(%ecx), %ymm5, %ymm6	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	-4096(%edx){1to8}, %ymm5, %ymm6{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	0x10000000(%esp, %esi, 8), %xmm5, %xmm6{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	(%ecx){1to4}, %xmm5, %xmm6	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	2032(%ecx), %xmm5, %xmm6	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	-2048(%edx){1to4}, %xmm5, %xmm6{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+
+.intel_syntax noprefix
+	vcvtne2ps2bf16	ymm6, ymm5, ymm4	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	xmm6, xmm5, xmm4	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	ymm6{k7}, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	ymm6, ymm5, DWORD PTR [ecx]{1to8}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	ymm6{k7}{z}, ymm5, DWORD PTR [edx-4096]{1to8}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtne2ps2bf16	xmm6{k7}, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	xmm6, xmm5, DWORD PTR [ecx]{1to4}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	xmm6{k7}{z}, xmm5, DWORD PTR [edx-2048]{1to4}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	xmm6, xmm5	 #AVX512{BF16,VL}
+	vcvtneps2bf16	xmm6, ymm5	 #AVX512{BF16,VL}
+	vcvtneps2bf16	xmm6{k7}, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	xmm6, DWORD PTR [ecx]{1to4}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16	xmm6, XMMWORD PTR [ecx+2032]	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	xmm6{k7}{z}, DWORD PTR [edx-2048]{1to4}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	xmm6, DWORD PTR [ecx]{1to8}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16	xmm6, YMMWORD PTR [ecx+4064]	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	xmm6{k7}{z}, DWORD PTR [edx-4096]{1to8}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	ymm6, ymm5, ymm4	 #AVX512{BF16,VL}
+	vdpbf16ps	xmm6, xmm5, xmm4	 #AVX512{BF16,VL}
+	vdpbf16ps	ymm6{k7}, ymm5, YMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	ymm6, ymm5, DWORD PTR [ecx]{1to8}	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	ymm6, ymm5, YMMWORD PTR [ecx+4064]	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	ymm6{k7}{z}, ymm5, DWORD PTR [edx-4096]{1to8}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	xmm6{k7}, xmm5, XMMWORD PTR [esp+esi*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	xmm6, xmm5, DWORD PTR [ecx]{1to4}	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	xmm6, xmm5, XMMWORD PTR [ecx+2032]	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	xmm6{k7}{z}, xmm5, DWORD PTR [edx-2048]{1to4}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
diff --git a/gas/testsuite/gas/i386/i386.exp b/gas/testsuite/gas/i386/i386.exp
index afb6116d96..1dd131334e 100644
--- a/gas/testsuite/gas/i386/i386.exp
+++ b/gas/testsuite/gas/i386/i386.exp
@@ -423,6 +423,9 @@  if [expr ([istarget "i*86-*-*"] ||  [istarget "x86_64-*-*"]) && [gas_32_check]]
     run_dump_test "avx512bitalg-intel"
     run_dump_test "avx512bitalg_vl"
     run_dump_test "avx512bitalg_vl-intel"
+    run_dump_test "avx512_bf16"
+    run_dump_test "avx512_bf16_vl"
+    run_list_test "avx512_bf16_vl-inval"
     run_list_test "sg"
     run_dump_test "clzero"
     run_dump_test "disassem"
@@ -937,6 +940,9 @@  if [expr ([istarget "i*86-*-*"] || [istarget "x86_64-*-*"]) && [gas_64_check]] t
     run_dump_test "x86-64-avx512bitalg-intel"
     run_dump_test "x86-64-avx512bitalg_vl"
     run_dump_test "x86-64-avx512bitalg_vl-intel"
+    run_dump_test "x86-64-avx512_bf16"
+    run_dump_test "x86-64-avx512_bf16_vl"
+    run_list_test "x86-64-avx512_bf16_vl-inval"
     run_dump_test "x86-64-clzero"
     run_dump_test "x86-64-mwaitx-bdver4"
     run_list_test "x86-64-mwaitx-reg"
diff --git a/gas/testsuite/gas/i386/x86-64-avx512_bf16.d b/gas/testsuite/gas/i386/x86-64-avx512_bf16.d
new file mode 100644
index 0000000000..2b3c27f696
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx512_bf16.d
@@ -0,0 +1,42 @@ 
+#as:
+#objdump: -dw
+#name: x86-64 BF16 insns
+#source: x86-64-avx512_bf16.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	62 02 17 40 72 f4    	vcvtne2ps2bf16 %zmm28,%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 22 17 47 72 b4 f5 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%rbp,%r14,8\),%zmm29,%zmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 17 50 72 31    	vcvtne2ps2bf16 \(%r9\)\{1to16\},%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 17 40 72 71 7f 	vcvtne2ps2bf16 0x1fc0\(%rcx\),%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 17 d7 72 b2 00 e0 ff ff 	vcvtne2ps2bf16 -0x2000\(%rdx\)\{1to16\},%zmm29,%zmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 7e 48 72 f5    	vcvtneps2bf16 %zmm29,%ymm30
+[ 	]*[a-f0-9]+:	62 22 7e 4f 72 b4 f5 00 00 00 10 	vcvtneps2bf16 0x10000000\(%rbp,%r14,8\),%ymm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 7e 58 72 31    	vcvtneps2bf16 \(%r9\)\{1to16\},%ymm30
+[ 	]*[a-f0-9]+:	62 62 7e 48 72 71 7f 	vcvtneps2bf16 0x1fc0\(%rcx\),%ymm30
+[ 	]*[a-f0-9]+:	62 62 7e df 72 b2 00 e0 ff ff 	vcvtneps2bf16 -0x2000\(%rdx\)\{1to16\},%ymm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 16 40 52 f4    	vdpbf16ps %zmm28,%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 22 16 47 52 b4 f5 00 00 00 10 	vdpbf16ps 0x10000000\(%rbp,%r14,8\),%zmm29,%zmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 16 50 52 31    	vdpbf16ps \(%r9\)\{1to16\},%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 16 40 52 71 7f 	vdpbf16ps 0x1fc0\(%rcx\),%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 16 d7 52 b2 00 e0 ff ff 	vdpbf16ps -0x2000\(%rdx\)\{1to16\},%zmm29,%zmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 17 40 72 f4    	vcvtne2ps2bf16 %zmm28,%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 22 17 47 72 b4 f5 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%rbp,%r14,8\),%zmm29,%zmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 17 50 72 31    	vcvtne2ps2bf16 \(%r9\)\{1to16\},%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 17 40 72 71 7f 	vcvtne2ps2bf16 0x1fc0\(%rcx\),%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 17 d7 72 b2 00 e0 ff ff 	vcvtne2ps2bf16 -0x2000\(%rdx\)\{1to16\},%zmm29,%zmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 7e 48 72 f5    	vcvtneps2bf16 %zmm29,%ymm30
+[ 	]*[a-f0-9]+:	62 22 7e 4f 72 b4 f5 00 00 00 10 	vcvtneps2bf16 0x10000000\(%rbp,%r14,8\),%ymm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 7e 58 72 31    	vcvtneps2bf16 \(%r9\)\{1to16\},%ymm30
+[ 	]*[a-f0-9]+:	62 62 7e 48 72 71 7f 	vcvtneps2bf16 0x1fc0\(%rcx\),%ymm30
+[ 	]*[a-f0-9]+:	62 62 7e df 72 b2 00 e0 ff ff 	vcvtneps2bf16 -0x2000\(%rdx\)\{1to16\},%ymm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 16 40 52 f4    	vdpbf16ps %zmm28,%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 22 16 47 52 b4 f5 00 00 00 10 	vdpbf16ps 0x10000000\(%rbp,%r14,8\),%zmm29,%zmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 16 50 52 31    	vdpbf16ps \(%r9\)\{1to16\},%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 16 40 52 71 7f 	vdpbf16ps 0x1fc0\(%rcx\),%zmm29,%zmm30
+[ 	]*[a-f0-9]+:	62 62 16 d7 52 b2 00 e0 ff ff 	vdpbf16ps -0x2000\(%rdx\)\{1to16\},%zmm29,%zmm30\{%k7\}\{z\}
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-avx512_bf16.s b/gas/testsuite/gas/i386/x86-64-avx512_bf16.s
new file mode 100644
index 0000000000..5dc3b5e14c
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx512_bf16.s
@@ -0,0 +1,37 @@ 
+# Check 64bit AVX512_BF16 instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vcvtne2ps2bf16	%zmm28, %zmm29, %zmm30	 #AVX512_BF16
+	vcvtne2ps2bf16	0x10000000(%rbp, %r14, 8), %zmm29, %zmm30{%k7}	 #AVX512_BF16 MASK_ENABLING
+	vcvtne2ps2bf16	(%r9){1to16}, %zmm29, %zmm30	 #AVX512_BF16 BROADCAST_EN
+	vcvtne2ps2bf16	8128(%rcx), %zmm29, %zmm30	 #AVX512_BF16 Disp8
+	vcvtne2ps2bf16	-8192(%rdx){1to16}, %zmm29, %zmm30{%k7}{z}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	%zmm29, %ymm30	 #AVX512_BF16
+	vcvtneps2bf16	0x10000000(%rbp, %r14, 8), %ymm30{%k7}	 #AVX512_BF16 MASK_ENABLING
+	vcvtneps2bf16	(%r9){1to16}, %ymm30	 #AVX512_BF16 BROADCAST_EN
+	vcvtneps2bf16	8128(%rcx), %ymm30	 #AVX512_BF16 Disp8
+	vcvtneps2bf16	-8192(%rdx){1to16}, %ymm30{%k7}{z}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	%zmm28, %zmm29, %zmm30	 #AVX512_BF16
+	vdpbf16ps	0x10000000(%rbp, %r14, 8), %zmm29, %zmm30{%k7}	 #AVX512_BF16 MASK_ENABLING
+	vdpbf16ps	(%r9){1to16}, %zmm29, %zmm30	 #AVX512_BF16 BROADCAST_EN
+	vdpbf16ps	8128(%rcx), %zmm29, %zmm30	 #AVX512_BF16 Disp8
+	vdpbf16ps	-8192(%rdx){1to16}, %zmm29, %zmm30{%k7}{z}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+
+.intel_syntax noprefix
+	vcvtne2ps2bf16	zmm30, zmm29, zmm28	 #AVX512_BF16
+	vcvtne2ps2bf16	zmm30{k7}, zmm29, ZMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512_BF16 MASK_ENABLING
+	vcvtne2ps2bf16	zmm30, zmm29, DWORD PTR [r9]{1to16}	 #AVX512_BF16 BROADCAST_EN
+	vcvtne2ps2bf16	zmm30, zmm29, ZMMWORD PTR [rcx+8128]	 #AVX512_BF16 Disp8
+	vcvtne2ps2bf16	zmm30{k7}{z}, zmm29, DWORD PTR [rdx-8192]{1to16}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	ymm30, zmm29	 #AVX512_BF16
+	vcvtneps2bf16	ymm30{k7}, ZMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512_BF16 MASK_ENABLING
+	vcvtneps2bf16	ymm30, DWORD PTR [r9]{1to16}	 #AVX512_BF16 BROADCAST_EN
+	vcvtneps2bf16	ymm30, ZMMWORD PTR [rcx+8128]	 #AVX512_BF16 Disp8
+	vcvtneps2bf16	ymm30{k7}{z}, DWORD PTR [rdx-8192]{1to16}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	zmm30, zmm29, zmm28	 #AVX512_BF16
+	vdpbf16ps	zmm30{k7}, zmm29, ZMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512_BF16 MASK_ENABLING
+	vdpbf16ps	zmm30, zmm29, DWORD PTR [r9]{1to16}	 #AVX512_BF16 BROADCAST_EN
+	vdpbf16ps	zmm30, zmm29, ZMMWORD PTR [rcx+8128]	 #AVX512_BF16 Disp8
+	vdpbf16ps	zmm30{k7}{z}, zmm29, DWORD PTR [rdx-8192]{1to16}	 #AVX512_BF16 Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
diff --git a/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.l b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.l
new file mode 100644
index 0000000000..dfd21d6692
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.l
@@ -0,0 +1,7 @@ 
+.*: Assembler messages:
+.*:6: Error: .*
+.*:7: Error: .*
+.*:8: Error: .*
+.*:11: Error: .*
+.*:12: Error: .*
+.*:13: Error: .*
diff --git a/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.s b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.s
new file mode 100644
index 0000000000..045511b803
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl-inval.s
@@ -0,0 +1,13 @@ 
+# Check illegal 64bit AVX512{BF16,VL} instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vcvtneps2bf16	0x10000000(%rbp, %r14, 8), %xmm30{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	2032(%rcx), %xmm30	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	4064(%rcx), %xmm30	 #AVX512{BF16,VL} Disp8
+
+.intel_syntax noprefix
+	vcvtneps2bf16	xmm30{k7}, [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	xmm30, [rcx+2032]	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	xmm30, [rcx+4064]	 #AVX512{BF16,VL} Disp8
diff --git a/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.d b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.d
new file mode 100644
index 0000000000..43810a6ac6
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.d
@@ -0,0 +1,70 @@ 
+#as:
+#objdump: -dw
+#name: x86-64 BF16 VL insns
+#source: x86-64-avx512_bf16_vl.s
+
+.*: +file format .*
+
+
+Disassembly of section .text:
+
+0+ <_start>:
+[ 	]*[a-f0-9]+:	62 02 17 20 72 f4    	vcvtne2ps2bf16 %ymm28,%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 02 17 00 72 f4    	vcvtne2ps2bf16 %xmm28,%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 22 17 27 72 b4 f5 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%rbp,%r14,8\),%ymm29,%ymm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 17 30 72 31    	vcvtne2ps2bf16 \(%r9\)\{1to8\},%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 17 20 72 71 7f 	vcvtne2ps2bf16 0xfe0\(%rcx\),%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 17 b7 72 b2 00 f0 ff ff 	vcvtne2ps2bf16 -0x1000\(%rdx\)\{1to8\},%ymm29,%ymm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 22 17 07 72 b4 f5 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%rbp,%r14,8\),%xmm29,%xmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 17 10 72 31    	vcvtne2ps2bf16 \(%r9\)\{1to4\},%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 17 00 72 71 7f 	vcvtne2ps2bf16 0x7f0\(%rcx\),%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 17 97 72 a2 00 f8 ff ff 	vcvtne2ps2bf16 -0x800\(%rdx\)\{1to4\},%xmm29,%xmm28\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 7e 08 72 f5    	vcvtneps2bf16 %xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 02 7e 28 72 f5    	vcvtneps2bf16 %ymm29,%xmm30
+[ 	]*[a-f0-9]+:	62 22 7e 0f 72 b4 f5 00 00 00 10 	vcvtneps2bf16x 0x10000000\(%rbp,%r14,8\),%xmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 c2 7e 18 72 29    	vcvtneps2bf16 \(%r9\)\{1to4\},%xmm21
+[ 	]*[a-f0-9]+:	62 62 7e 08 72 71 7f 	vcvtneps2bf16x 0x7f0\(%rcx\),%xmm30
+[ 	]*[a-f0-9]+:	62 62 7e 9f 72 aa 00 f8 ff ff 	vcvtneps2bf16 -0x800\(%rdx\)\{1to4\},%xmm29\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 c2 7e 38 72 31    	vcvtneps2bf16 \(%r9\)\{1to8\},%xmm22
+[ 	]*[a-f0-9]+:	62 e2 7e 28 72 79 7f 	vcvtneps2bf16y 0xfe0\(%rcx\),%xmm23
+[ 	]*[a-f0-9]+:	62 62 7e bf 72 9a 00 f0 ff ff 	vcvtneps2bf16 -0x1000\(%rdx\)\{1to8\},%xmm27\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 16 20 52 f4    	vdpbf16ps %ymm28,%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 02 16 00 52 f4    	vdpbf16ps %xmm28,%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 22 16 27 52 b4 f5 00 00 00 10 	vdpbf16ps 0x10000000\(%rbp,%r14,8\),%ymm29,%ymm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 16 30 52 31    	vdpbf16ps \(%r9\)\{1to8\},%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 16 20 52 71 7f 	vdpbf16ps 0xfe0\(%rcx\),%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 16 b7 52 b2 00 f0 ff ff 	vdpbf16ps -0x1000\(%rdx\)\{1to8\},%ymm29,%ymm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 22 16 07 52 b4 f5 00 00 00 10 	vdpbf16ps 0x10000000\(%rbp,%r14,8\),%xmm29,%xmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 16 10 52 31    	vdpbf16ps \(%r9\)\{1to4\},%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 16 00 52 71 7f 	vdpbf16ps 0x7f0\(%rcx\),%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 16 97 52 b2 00 f8 ff ff 	vdpbf16ps -0x800\(%rdx\)\{1to4\},%xmm29,%xmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 17 20 72 f4    	vcvtne2ps2bf16 %ymm28,%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 02 17 00 72 f4    	vcvtne2ps2bf16 %xmm28,%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 22 17 27 72 b4 f5 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%rbp,%r14,8\),%ymm29,%ymm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 17 30 72 31    	vcvtne2ps2bf16 \(%r9\)\{1to8\},%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 17 20 72 71 7f 	vcvtne2ps2bf16 0xfe0\(%rcx\),%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 17 b7 72 b2 00 f0 ff ff 	vcvtne2ps2bf16 -0x1000\(%rdx\)\{1to8\},%ymm29,%ymm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 22 17 07 72 b4 f5 00 00 00 10 	vcvtne2ps2bf16 0x10000000\(%rbp,%r14,8\),%xmm29,%xmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 17 10 72 31    	vcvtne2ps2bf16 \(%r9\)\{1to4\},%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 17 00 72 71 7f 	vcvtne2ps2bf16 0x7f0\(%rcx\),%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 17 97 72 b2 00 f8 ff ff 	vcvtne2ps2bf16 -0x800\(%rdx\)\{1to4\},%xmm29,%xmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 7e 08 72 f5    	vcvtneps2bf16 %xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 02 7e 28 72 f5    	vcvtneps2bf16 %ymm29,%xmm30
+[ 	]*[a-f0-9]+:	62 22 7e 0f 72 b4 f5 00 00 00 10 	vcvtneps2bf16x 0x10000000\(%rbp,%r14,8\),%xmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 7e 18 72 09    	vcvtneps2bf16 \(%r9\)\{1to4\},%xmm25
+[ 	]*[a-f0-9]+:	62 62 7e 08 72 71 7f 	vcvtneps2bf16x 0x7f0\(%rcx\),%xmm30
+[ 	]*[a-f0-9]+:	62 62 7e 9f 72 b2 00 f8 ff ff 	vcvtneps2bf16 -0x800\(%rdx\)\{1to4\},%xmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 42 7e 38 72 01    	vcvtneps2bf16 \(%r9\)\{1to8\},%xmm24
+[ 	]*[a-f0-9]+:	62 62 7e 28 72 71 7f 	vcvtneps2bf16y 0xfe0\(%rcx\),%xmm30
+[ 	]*[a-f0-9]+:	62 62 7e bf 72 b2 00 f0 ff ff 	vcvtneps2bf16 -0x1000\(%rdx\)\{1to8\},%xmm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 02 16 20 52 f4    	vdpbf16ps %ymm28,%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 02 16 00 52 f4    	vdpbf16ps %xmm28,%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 22 16 27 52 b4 f5 00 00 00 10 	vdpbf16ps 0x10000000\(%rbp,%r14,8\),%ymm29,%ymm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 16 30 52 31    	vdpbf16ps \(%r9\)\{1to8\},%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 16 20 52 71 7f 	vdpbf16ps 0xfe0\(%rcx\),%ymm29,%ymm30
+[ 	]*[a-f0-9]+:	62 62 16 b7 52 b2 00 f0 ff ff 	vdpbf16ps -0x1000\(%rdx\)\{1to8\},%ymm29,%ymm30\{%k7\}\{z\}
+[ 	]*[a-f0-9]+:	62 22 16 07 52 b4 f5 00 00 00 10 	vdpbf16ps 0x10000000\(%rbp,%r14,8\),%xmm29,%xmm30\{%k7\}
+[ 	]*[a-f0-9]+:	62 42 16 10 52 31    	vdpbf16ps \(%r9\)\{1to4\},%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 16 00 52 71 7f 	vdpbf16ps 0x7f0\(%rcx\),%xmm29,%xmm30
+[ 	]*[a-f0-9]+:	62 62 16 97 52 b2 00 f8 ff ff 	vdpbf16ps -0x800\(%rdx\)\{1to4\},%xmm29,%xmm30\{%k7\}\{z\}
+#pass
diff --git a/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.s b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.s
new file mode 100644
index 0000000000..e7c3a0aee4
--- /dev/null
+++ b/gas/testsuite/gas/i386/x86-64-avx512_bf16_vl.s
@@ -0,0 +1,65 @@ 
+# Check 64bit AVX512{BF16,VL} instructions
+
+	.allow_index_reg
+	.text
+_start:
+	vcvtne2ps2bf16	%ymm28, %ymm29, %ymm30	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	%xmm28, %xmm29, %xmm30	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	0x10000000(%rbp, %r14, 8), %ymm29, %ymm30{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	(%r9){1to8}, %ymm29, %ymm30	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	4064(%rcx), %ymm29, %ymm30	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	-4096(%rdx){1to8}, %ymm29, %ymm30{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtne2ps2bf16	0x10000000(%rbp, %r14, 8), %xmm29, %xmm30{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	(%r9){1to4}, %xmm29, %xmm30	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	2032(%rcx), %xmm29, %xmm30	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	-2048(%rdx){1to4}, %xmm29, %xmm28{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	%xmm29, %xmm30	 #AVX512{BF16,VL}
+	vcvtneps2bf16	%ymm29, %xmm30	 #AVX512{BF16,VL}
+	vcvtneps2bf16x	0x10000000(%rbp, %r14, 8), %xmm30{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	(%r9){1to4}, %xmm21	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16x	2032(%rcx), %xmm30	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	-2048(%rdx){1to4}, %xmm29{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	(%r9){1to8}, %xmm22	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16y	4064(%rcx), %xmm23	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	-4096(%rdx){1to8}, %xmm27{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	%ymm28, %ymm29, %ymm30	 #AVX512{BF16,VL}
+	vdpbf16ps	%xmm28, %xmm29, %xmm30	 #AVX512{BF16,VL}
+	vdpbf16ps	0x10000000(%rbp, %r14, 8), %ymm29, %ymm30{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	(%r9){1to8}, %ymm29, %ymm30	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	4064(%rcx), %ymm29, %ymm30	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	-4096(%rdx){1to8}, %ymm29, %ymm30{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	0x10000000(%rbp, %r14, 8), %xmm29, %xmm30{%k7}	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	(%r9){1to4}, %xmm29, %xmm30	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	2032(%rcx), %xmm29, %xmm30	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	-2048(%rdx){1to4}, %xmm29, %xmm30{%k7}{z}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+
+.intel_syntax noprefix
+	vcvtne2ps2bf16	ymm30, ymm29, ymm28	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	xmm30, xmm29, xmm28	 #AVX512{BF16,VL}
+	vcvtne2ps2bf16	ymm30{k7}, ymm29, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	ymm30, ymm29, DWORD PTR [r9]{1to8}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	ymm30, ymm29, YMMWORD PTR [rcx+4064]	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	ymm30{k7}{z}, ymm29, DWORD PTR [rdx-4096]{1to8}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtne2ps2bf16	xmm30{k7}, xmm29, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtne2ps2bf16	xmm30, xmm29, DWORD PTR [r9]{1to4}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtne2ps2bf16	xmm30, xmm29, XMMWORD PTR [rcx+2032]	 #AVX512{BF16,VL} Disp8
+	vcvtne2ps2bf16	xmm30{k7}{z}, xmm29, DWORD PTR [rdx-2048]{1to4}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	xmm30, xmm29	 #AVX512{BF16,VL}
+	vcvtneps2bf16	xmm30, ymm29	 #AVX512{BF16,VL}
+	vcvtneps2bf16	xmm30{k7}, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vcvtneps2bf16	xmm25, DWORD PTR [r9]{1to4}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16	xmm30, XMMWORD PTR [rcx+2032]	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	xmm30{k7}{z}, DWORD PTR [rdx-2048]{1to4}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vcvtneps2bf16	xmm24, DWORD PTR [r9]{1to8}	 #AVX512{BF16,VL} BROADCAST_EN
+	vcvtneps2bf16	xmm30, YMMWORD PTR [rcx+4064]	 #AVX512{BF16,VL} Disp8
+	vcvtneps2bf16	xmm30{k7}{z}, DWORD PTR [rdx-4096]{1to8}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	ymm30, ymm29, ymm28	 #AVX512{BF16,VL}
+	vdpbf16ps	xmm30, xmm29, xmm28	 #AVX512{BF16,VL}
+	vdpbf16ps	ymm30{k7}, ymm29, YMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	ymm30, ymm29, DWORD PTR [r9]{1to8}	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	ymm30, ymm29, YMMWORD PTR [rcx+4064]	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	ymm30{k7}{z}, ymm29, DWORD PTR [rdx-4096]{1to8}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
+	vdpbf16ps	xmm30{k7}, xmm29, XMMWORD PTR [rbp+r14*8+0x10000000]	 #AVX512{BF16,VL} MASK_ENABLING
+	vdpbf16ps	xmm30, xmm29, DWORD PTR [r9]{1to4}	 #AVX512{BF16,VL} BROADCAST_EN
+	vdpbf16ps	xmm30, xmm29, XMMWORD PTR [rcx+2032]	 #AVX512{BF16,VL} Disp8
+	vdpbf16ps	xmm30{k7}{z}, xmm29, DWORD PTR [rdx-2048]{1to4}	 #AVX512{BF16,VL} Disp8 BROADCAST_EN MASK_ENABLING ZEROCTL
diff --git a/opcodes/i386-dis-evex.h b/opcodes/i386-dis-evex.h
index dec7fc4420..9fea25defc 100644
--- a/opcodes/i386-dis-evex.h
+++ b/opcodes/i386-dis-evex.h
@@ -2020,7 +2020,7 @@  static const struct dis386 evex_table[][256] = {
   /* PREFIX_EVEX_0F3852 */
   {
     { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F3852_P_1) },
     { "vpdpwssd",	{ XM, Vex, EXx }, 0 },
     { "vp4dpwssd",	{ XM, Vex, EXxmm }, 0 },
   },
@@ -2112,8 +2112,9 @@  static const struct dis386 evex_table[][256] = {
   /* PREFIX_EVEX_0F3872 */
   {
     { Bad_Opcode },
-    { Bad_Opcode },
+    { VEX_W_TABLE (EVEX_W_0F3872_P_1) },
     { VEX_W_TABLE (EVEX_W_0F3872_P_2) },
+    { VEX_W_TABLE (EVEX_W_0F3872_P_3) },
   },
   /* PREFIX_EVEX_0F3873 */
   {
@@ -3705,6 +3706,11 @@  static const struct dis386 evex_table[][256] = {
     { "vpmulld",	{ XM, Vex, EXx }, 0 },
     { "vpmullq",	{ XM, Vex, EXx }, 0 },
   },
+  /* EVEX_W_0F3852_P_1 */
+  {
+    { "vdpbf16ps",	{ XM, Vex, EXx }, 0 },
+    { Bad_Opcode },
+  },
   /* EVEX_W_0F3854_P_2 */
   {
     { "vpopcntb",	{ XM, EXx }, 0 },
@@ -3759,11 +3765,21 @@  static const struct dis386 evex_table[][256] = {
     { "vpshldvd",  { XM, Vex, EXx }, 0 },
     { "vpshldvq",  { XM, Vex, EXx }, 0 },
   },
+  /* EVEX_W_0F3872_P_1 */
+  {
+    { "vcvtneps2bf16%XY", { XMxmmq, EXx }, 0 },
+    { Bad_Opcode },
+  },
   /* EVEX_W_0F3872_P_2 */
   {
     { Bad_Opcode },
     { "vpshrdvw",  { XM, Vex, EXx }, 0 },
   },
+  /* EVEX_W_0F3872_P_3 */
+  {
+    { "vcvtne2ps2bf16", { XM, Vex, EXx}, 0 },
+    { Bad_Opcode },
+  },
   /* EVEX_W_0F3873_P_2 */
   {
     { "vpshrdvd",  { XM, Vex, EXx }, 0 },
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 8320924add..1ba7b4f2a3 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -2189,6 +2189,7 @@  enum
   EVEX_W_0F3839_P_1,
   EVEX_W_0F383A_P_1,
   EVEX_W_0F3840_P_2,
+  EVEX_W_0F3852_P_1,
   EVEX_W_0F3854_P_2,
   EVEX_W_0F3855_P_2,
   EVEX_W_0F3858_P_2,
@@ -2200,7 +2201,9 @@  enum
   EVEX_W_0F3866_P_2,
   EVEX_W_0F3870_P_2,
   EVEX_W_0F3871_P_2,
+  EVEX_W_0F3872_P_1,
   EVEX_W_0F3872_P_2,
+  EVEX_W_0F3872_P_3,
   EVEX_W_0F3873_P_2,
   EVEX_W_0F3875_P_2,
   EVEX_W_0F3878_P_2,
diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 2b93063b67..847a7bb1d9 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -231,6 +231,8 @@  static initializer cpu_flag_init[] =
     "CPU_AVX512F_FLAGS|CpuAVX512_VNNI" },
   { "CPU_AVX512_BITALG_FLAGS",
     "CPU_AVX512F_FLAGS|CpuAVX512_BITALG" },
+  { "CPU_AVX512_BF16_FLAGS",
+    "CPU_AVX512F_FLAGS|CpuAVX512_BF16" },
   { "CPU_L1OM_FLAGS",
     "unknown" },
   { "CPU_K1OM_FLAGS",
@@ -324,7 +326,7 @@  static initializer cpu_flag_init[] =
   { "CPU_ANY_AVX2_FLAGS",
     "CPU_ANY_AVX512F_FLAGS|CpuAVX2" },
   { "CPU_ANY_AVX512F_FLAGS",
-    "CpuAVX512F|CpuAVX512CD|CpuAVX512ER|CpuAVX512PF|CpuAVX512DQ|CpuAVX512BW|CpuAVX512VL|CpuAVX512IFMA|CpuAVX512VBMI|CpuAVX512_4FMAPS|CpuAVX512_4VNNIW|CpuAVX512_VPOPCNTDQ|CpuAVX512_VBMI2|CpuAVX512_VNNI|CpuAVX512_BITALG" },
+    "CpuAVX512F|CpuAVX512CD|CpuAVX512ER|CpuAVX512PF|CpuAVX512DQ|CpuAVX512BW|CpuAVX512VL|CpuAVX512IFMA|CpuAVX512VBMI|CpuAVX512_4FMAPS|CpuAVX512_4VNNIW|CpuAVX512_VPOPCNTDQ|CpuAVX512_VBMI2|CpuAVX512_VNNI|CpuAVX512_BITALG|CpuAVX512_BF16" },
   { "CPU_ANY_AVX512CD_FLAGS",
     "CpuAVX512CD" },
   { "CPU_ANY_AVX512ER_FLAGS",
@@ -357,6 +359,8 @@  static initializer cpu_flag_init[] =
     "CpuAVX512_VNNI" },
   { "CPU_ANY_AVX512_BITALG_FLAGS",
     "CpuAVX512_BITALG" },
+  { "CPU_ANY_AVX512_BF16_FLAGS",
+    "CpuAVX512_BF16" },
   { "CPU_ANY_MOVDIRI_FLAGS",
     "CpuMOVDIRI" },
   { "CPU_ANY_MOVDIR64B_FLAGS",
@@ -578,6 +582,7 @@  static bitfield cpu_flags[] =
   BITFIELD (CpuAVX512_VBMI2),
   BITFIELD (CpuAVX512_VNNI),
   BITFIELD (CpuAVX512_BITALG),
+  BITFIELD (CpuAVX512_BF16),
   BITFIELD (CpuMWAITX),
   BITFIELD (CpuCLZERO),
   BITFIELD (CpuOSPKE),
diff --git a/opcodes/i386-opc.h b/opcodes/i386-opc.h
index 1516dd96b4..258a218d87 100644
--- a/opcodes/i386-opc.h
+++ b/opcodes/i386-opc.h
@@ -206,6 +206,8 @@  enum
   CpuAVX512_VNNI,
   /* Intel AVX-512 BITALG Instructions support required.  */
   CpuAVX512_BITALG,
+  /* Intel AVX-512 BF16 Instructions support required.  */
+  CpuAVX512_BF16,
   /* mwaitx instruction required */
   CpuMWAITX,
   /* Clzero instruction required */
@@ -347,6 +349,7 @@  typedef union i386_cpu_flags
       unsigned int cpuavx512_vbmi2:1;
       unsigned int cpuavx512_vnni:1;
       unsigned int cpuavx512_bitalg:1;
+      unsigned int cpuavx512_bf16:1;
       unsigned int cpumwaitx:1;
       unsigned int cpuclzero:1;
       unsigned int cpuospke:1;
diff --git a/opcodes/i386-opc.tbl b/opcodes/i386-opc.tbl
index 26a68d8cbe..56fe4ef356 100644
--- a/opcodes/i386-opc.tbl
+++ b/opcodes/i386-opc.tbl
@@ -4710,3 +4710,33 @@  movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|CpuNo64, Modrm|IgnoreSize|No_bSu
 movdir64b, 2, 0x660f38f8, None, 3, CpuMOVDIR64B|Cpu64, Modrm|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|NoRex64|AddrPrefixOpReg, { Unspecified|BaseIndex, Reg32|Reg64 }
 
 // MOVEDIR instructions end.
+
+// AVX512_BF16 instructions.
+
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex=1|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }
+
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }
+vcvtne2ps2bf16, 3, 0xf272, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }
+
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|BaseIndex, RegXMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|BaseIndex, RegXMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegYMM }
+
+vcvtneps2bf16x, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|Masking=3|VexW0|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM }
+vcvtneps2bf16y, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|Masking=3|VexW0|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf|ATTSyntax, { Dword|Ymmword|Unspecified|BaseIndex, RegXMM }
+
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegXMM }
+vcvtneps2bf16, 2, 0xf372, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegYMM }
+
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=4|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Xmmword|Unspecified|BaseIndex, RegXMM, RegXMM }
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=5|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Ymmword|Unspecified|BaseIndex, RegYMM, RegYMM }
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|Broadcast|Disp8MemShift=6|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { Dword|Zmmword|Unspecified|BaseIndex, RegZMM, RegZMM }
+
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex128|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegXMM, RegXMM, RegXMM }
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16|CpuAVX512VL, Modrm|VexOpcode=1|EVex256|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegYMM, RegYMM, RegYMM }
+vdpbf16ps, 3, 0xf352, None, 1, CpuAVX512_BF16, Modrm|VexOpcode=1|EVex512|VexVVVV=1|Masking=3|VexW0|IgnoreSize|No_bSuf|No_wSuf|No_lSuf|No_sSuf|No_qSuf|No_ldSuf, { RegZMM, RegZMM, RegZMM }
+// AVX512_BF16 instructions end.