x86: fix SSE4a dependencies of ".arch .nosse*"

Message ID 3bc597bb-10f9-80f9-8e00-f28aeb2eea77@suse.com
State New
Headers show
Series
  • x86: fix SSE4a dependencies of ".arch .nosse*"
Related show

Commit Message

Jan Beulich Feb. 12, 2020, 5:08 p.m.
Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also
disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

gas/
2020-02-XX  Jan Beulich  <jbeulich@suse.com>

	* config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in
	"nosse4" entry.

opcodes/
2020-02-XX  Jan Beulich  <jbeulich@suse.com>

	* i386-gen.c (cpu_flag_init): Move CpuSSE4a from
	CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add
	CPU_ANY_SSE4_FLAGS entry.
	* i386-init.h: Re-generate.

Comments

H.J. Lu Feb. 12, 2020, 5:18 p.m. | #1
On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

>

> gas/

> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>

>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

>         "nosse4" entry.

>

> opcodes/

> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>

>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

>         CPU_ANY_SSE4_FLAGS entry.

>         * i386-init.h: Re-generate.

>


OK.

Thanks.

-- 
H.J.
H.J. Lu Feb. 16, 2020, 4:47 p.m. | #2
On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:
>

> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

> >

> > Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

> > disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

> >

> > gas/

> > 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >

> >         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

> >         "nosse4" entry.

> >

> > opcodes/

> > 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >

> >         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

> >         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

> >         CPU_ANY_SSE4_FLAGS entry.

> >         * i386-init.h: Re-generate.

> >

>

> OK.

>

> Thanks.


commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is
a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel
SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also
adds .sse4a and nosse4a.

-- 
H.J.
Alan Modra Feb. 17, 2020, 1:06 a.m. | #3
On Sun, Feb 16, 2020 at 08:47:56AM -0800, H.J. Lu wrote:
> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> adds .sse4a and nosse4a.


diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 79f4cc9d25..45106bcf6d 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -326,6 +326,8 @@ static initializer cpu_flag_init[] =
   { "CPU_ANY_SSE2_FLAGS",
     "CPU_ANY_SSE3_FLAGS|CpuSSE2" },
   { "CPU_ANY_SSE3_FLAGS",
+  { "CPU_ANY_SSE4A_FLAGS",
+    "CPU_ANY_SSE3_FLAGS|CpuSSE4a" },
     "CPU_ANY_SSSE3_FLAGS|CpuSSE3|CpuSSE4a" },
   { "CPU_ANY_SSSE3_FLAGS",
     "CPU_ANY_SSE4_1_FLAGS|CpuSSSE3" },
@@ -333,8 +335,6 @@ static initializer cpu_flag_init[] =
     "CPU_ANY_SSE4_2_FLAGS|CpuSSE4_1" },
   { "CPU_ANY_SSE4_2_FLAGS",
     "CpuSSE4_2" },
-  { "CPU_ANY_SSE4_FLAGS",
-    "CPU_ANY_SSE4_1_FLAGS|CpuSSE4a" },
   { "CPU_ANY_AVX_FLAGS",
     "CPU_ANY_AVX2_FLAGS|CpuF16C|CpuFMA|CpuFMA4|CpuXOP|CpuAVX" },
   { "CPU_ANY_AVX2_FLAGS",

Merge error?

-- 
Alan Modra
Australia Development Lab, IBM
H.J. Lu Feb. 17, 2020, 1:19 a.m. | #4
On Sun, Feb 16, 2020 at 5:06 PM Alan Modra <amodra@gmail.com> wrote:
>

> On Sun, Feb 16, 2020 at 08:47:56AM -0800, H.J. Lu wrote:

> > commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> > a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> > SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> > adds .sse4a and nosse4a.

>

> diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c

> index 79f4cc9d25..45106bcf6d 100644

> --- a/opcodes/i386-gen.c

> +++ b/opcodes/i386-gen.c

> @@ -326,6 +326,8 @@ static initializer cpu_flag_init[] =

>    { "CPU_ANY_SSE2_FLAGS",

>      "CPU_ANY_SSE3_FLAGS|CpuSSE2" },

>    { "CPU_ANY_SSE3_FLAGS",

> +  { "CPU_ANY_SSE4A_FLAGS",

> +    "CPU_ANY_SSE3_FLAGS|CpuSSE4a" },

>      "CPU_ANY_SSSE3_FLAGS|CpuSSE3|CpuSSE4a" },

>    { "CPU_ANY_SSSE3_FLAGS",

>      "CPU_ANY_SSE4_1_FLAGS|CpuSSSE3" },

> @@ -333,8 +335,6 @@ static initializer cpu_flag_init[] =

>      "CPU_ANY_SSE4_2_FLAGS|CpuSSE4_1" },

>    { "CPU_ANY_SSE4_2_FLAGS",

>      "CpuSSE4_2" },

> -  { "CPU_ANY_SSE4_FLAGS",

> -    "CPU_ANY_SSE4_1_FLAGS|CpuSSE4a" },

>    { "CPU_ANY_AVX_FLAGS",

>      "CPU_ANY_AVX2_FLAGS|CpuF16C|CpuFMA|CpuFMA4|CpuXOP|CpuAVX" },

>    { "CPU_ANY_AVX2_FLAGS",

>

> Merge error?


Is there anything wrong?


-- 
H.J.
Alan Modra Feb. 17, 2020, 1:31 a.m. | #5
On Sun, Feb 16, 2020 at 05:19:39PM -0800, H.J. Lu wrote:
> On Sun, Feb 16, 2020 at 5:06 PM Alan Modra <amodra@gmail.com> wrote:

> >

> > On Sun, Feb 16, 2020 at 08:47:56AM -0800, H.J. Lu wrote:

> > > commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> > > a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> > > SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> > > adds .sse4a and nosse4a.

> >

> > diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c

> > index 79f4cc9d25..45106bcf6d 100644

> > --- a/opcodes/i386-gen.c

> > +++ b/opcodes/i386-gen.c

> > @@ -326,6 +326,8 @@ static initializer cpu_flag_init[] =

> >    { "CPU_ANY_SSE2_FLAGS",

> >      "CPU_ANY_SSE3_FLAGS|CpuSSE2" },

> >    { "CPU_ANY_SSE3_FLAGS",

> > +  { "CPU_ANY_SSE4A_FLAGS",

> > +    "CPU_ANY_SSE3_FLAGS|CpuSSE4a" },

> >      "CPU_ANY_SSSE3_FLAGS|CpuSSE3|CpuSSE4a" },

> >    { "CPU_ANY_SSSE3_FLAGS",

> >      "CPU_ANY_SSE4_1_FLAGS|CpuSSSE3" },

> > @@ -333,8 +335,6 @@ static initializer cpu_flag_init[] =

> >      "CPU_ANY_SSE4_2_FLAGS|CpuSSE4_1" },

> >    { "CPU_ANY_SSE4_2_FLAGS",

> >      "CpuSSE4_2" },

> > -  { "CPU_ANY_SSE4_FLAGS",

> > -    "CPU_ANY_SSE4_1_FLAGS|CpuSSE4a" },

> >    { "CPU_ANY_AVX_FLAGS",

> >      "CPU_ANY_AVX2_FLAGS|CpuF16C|CpuFMA|CpuFMA4|CpuXOP|CpuAVX" },

> >    { "CPU_ANY_AVX2_FLAGS",

> >

> > Merge error?

> 

> Is there anything wrong?


It doesn't compile.  The CPU_ANY_SSE4A_FLAGS entry is added inside the
CPU_ANY_SSE3_FLAGS entry.  Take a look at the diff.

-- 
Alan Modra
Australia Development Lab, IBM
Alan Modra Feb. 17, 2020, 3:12 a.m. | #6
On Mon, Feb 17, 2020 at 12:01:56PM +1030, Alan Modra wrote:
> It doesn't compile.  The CPU_ANY_SSE4A_FLAGS entry is added inside the

> CPU_ANY_SSE3_FLAGS entry.  Take a look at the diff.


Since it is probably getting late for you I committed this to fix the
problem.

	* i386-gen.c (cpu_flag_init): Correct last change.

diff --git a/opcodes/i386-gen.c b/opcodes/i386-gen.c
index 45106bcf6d..407479261c 100644
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -326,8 +326,6 @@ static initializer cpu_flag_init[] =
   { "CPU_ANY_SSE2_FLAGS",
     "CPU_ANY_SSE3_FLAGS|CpuSSE2" },
   { "CPU_ANY_SSE3_FLAGS",
-  { "CPU_ANY_SSE4A_FLAGS",
-    "CPU_ANY_SSE3_FLAGS|CpuSSE4a" },
     "CPU_ANY_SSSE3_FLAGS|CpuSSE3|CpuSSE4a" },
   { "CPU_ANY_SSSE3_FLAGS",
     "CPU_ANY_SSE4_1_FLAGS|CpuSSSE3" },
@@ -335,6 +333,8 @@ static initializer cpu_flag_init[] =
     "CPU_ANY_SSE4_2_FLAGS|CpuSSE4_1" },
   { "CPU_ANY_SSE4_2_FLAGS",
     "CpuSSE4_2" },
+  { "CPU_ANY_SSE4A_FLAGS",
+    "CPU_ANY_SSE3_FLAGS|CpuSSE4a" },
   { "CPU_ANY_AVX_FLAGS",
     "CPU_ANY_AVX2_FLAGS|CpuF16C|CpuFMA|CpuFMA4|CpuXOP|CpuAVX" },
   { "CPU_ANY_AVX2_FLAGS",

-- 
Alan Modra
Australia Development Lab, IBM
Jan Beulich Feb. 17, 2020, 3:27 p.m. | #7
On 16.02.2020 17:47, H.J. Lu wrote:
> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>>

>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>

>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

>>>

>>> gas/

>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>

>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

>>>         "nosse4" entry.

>>>

>>> opcodes/

>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>

>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

>>>         CPU_ANY_SSE4_FLAGS entry.

>>>         * i386-init.h: Re-generate.

>>>

>>

>> OK.

>>

>> Thanks.

> 

> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> adds .sse4a and nosse4a.


And where is it said that "nosse4" means only the Intel flavors? As
said in the commit message of said change, to me the clear implication
is that anything called SSE4* will get disabled.

Jan
H.J. Lu Feb. 17, 2020, 3:30 p.m. | #8
On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 16.02.2020 17:47, H.J. Lu wrote:

> > On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >>

> >> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>

> >>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

> >>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

> >>>

> >>> gas/

> >>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>

> >>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

> >>>         "nosse4" entry.

> >>>

> >>> opcodes/

> >>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>

> >>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

> >>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

> >>>         CPU_ANY_SSE4_FLAGS entry.

> >>>         * i386-init.h: Re-generate.

> >>>

> >>

> >> OK.

> >>

> >> Thanks.

> >

> > commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> > a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> > SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> > adds .sse4a and nosse4a.

>

> And where is it said that "nosse4" means only the Intel flavors? As

> said in the commit message of said change, to me the clear implication

> is that anything called SSE4* will get disabled.

>


SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.
SSE4a from AMD is unrelated from Intel SSE4.


-- 
H.J.
Jan Beulich Feb. 17, 2020, 3:32 p.m. | #9
On 17.02.2020 16:30, H.J. Lu wrote:
> On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 16.02.2020 17:47, H.J. Lu wrote:

>>> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>

>>>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>

>>>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

>>>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

>>>>>

>>>>> gas/

>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>>>

>>>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

>>>>>         "nosse4" entry.

>>>>>

>>>>> opcodes/

>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>>>

>>>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

>>>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

>>>>>         CPU_ANY_SSE4_FLAGS entry.

>>>>>         * i386-init.h: Re-generate.

>>>>>

>>>>

>>>> OK.

>>>>

>>>> Thanks.

>>>

>>> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

>>> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

>>> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

>>> adds .sse4a and nosse4a.

>>

>> And where is it said that "nosse4" means only the Intel flavors? As

>> said in the commit message of said change, to me the clear implication

>> is that anything called SSE4* will get disabled.

>>

> 

> SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.

> SSE4a from AMD is unrelated from Intel SSE4.


Repeating my question then: Where is this being said? (Best imo
would be to delete ".arch .nosse4" support then, eliminating
the ambiguity.)

Jan
H.J. Lu Feb. 17, 2020, 3:44 p.m. | #10
On Mon, Feb 17, 2020 at 7:32 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 17.02.2020 16:30, H.J. Lu wrote:

> > On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 16.02.2020 17:47, H.J. Lu wrote:

> >>> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >>>>

> >>>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>

> >>>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

> >>>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

> >>>>>

> >>>>> gas/

> >>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>>>

> >>>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

> >>>>>         "nosse4" entry.

> >>>>>

> >>>>> opcodes/

> >>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>>>

> >>>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

> >>>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

> >>>>>         CPU_ANY_SSE4_FLAGS entry.

> >>>>>         * i386-init.h: Re-generate.

> >>>>>

> >>>>

> >>>> OK.

> >>>>

> >>>> Thanks.

> >>>

> >>> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> >>> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> >>> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> >>> adds .sse4a and nosse4a.

> >>

> >> And where is it said that "nosse4" means only the Intel flavors? As

> >> said in the commit message of said change, to me the clear implication

> >> is that anything called SSE4* will get disabled.

> >>

> >

> > SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.

> > SSE4a from AMD is unrelated from Intel SSE4.

>

> Repeating my question then: Where is this being said? (Best imo

> would be to delete ".arch .nosse4" support then, eliminating

> the ambiguity.)


We have both .sse4 and nosse4 which are aliases for SSE4.2.  Please
feel free to add documentation.

-- 
H.J.
Jan Beulich Feb. 17, 2020, 3:49 p.m. | #11
On 17.02.2020 16:44, H.J. Lu wrote:
> On Mon, Feb 17, 2020 at 7:32 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 17.02.2020 16:30, H.J. Lu wrote:

>>> On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>

>>>> On 16.02.2020 17:47, H.J. Lu wrote:

>>>>> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>>>

>>>>>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>>>

>>>>>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

>>>>>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

>>>>>>>

>>>>>>> gas/

>>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>>>>>

>>>>>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

>>>>>>>         "nosse4" entry.

>>>>>>>

>>>>>>> opcodes/

>>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>>>>>

>>>>>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

>>>>>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

>>>>>>>         CPU_ANY_SSE4_FLAGS entry.

>>>>>>>         * i386-init.h: Re-generate.

>>>>>>>

>>>>>>

>>>>>> OK.

>>>>>>

>>>>>> Thanks.

>>>>>

>>>>> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

>>>>> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

>>>>> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

>>>>> adds .sse4a and nosse4a.

>>>>

>>>> And where is it said that "nosse4" means only the Intel flavors? As

>>>> said in the commit message of said change, to me the clear implication

>>>> is that anything called SSE4* will get disabled.

>>>>

>>>

>>> SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.

>>> SSE4a from AMD is unrelated from Intel SSE4.

>>

>> Repeating my question then: Where is this being said? (Best imo

>> would be to delete ".arch .nosse4" support then, eliminating

>> the ambiguity.)

> 

> We have both .sse4 and nosse4 which are aliases for SSE4.2.  Please

> feel free to add documentation.


If it's not documented, then it's not clear at all what the intention
is. I'm certainly not going to add documentation saying something that
I don't believe should be said. I.e. if I were to add documentation
here, it'd say .nosse4 covers all three SSE4* variants (and it would
then be a bug of the implementation that this isn't the case).

Just like for the MOVSX/MOVZX issue, I really dislike you making
statements of things that were (apparently) never settled on.

Jan
H.J. Lu Feb. 17, 2020, 4:52 p.m. | #12
On Mon, Feb 17, 2020 at 7:49 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 17.02.2020 16:44, H.J. Lu wrote:

> > On Mon, Feb 17, 2020 at 7:32 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 17.02.2020 16:30, H.J. Lu wrote:

> >>> On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>

> >>>> On 16.02.2020 17:47, H.J. Lu wrote:

> >>>>> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >>>>>>

> >>>>>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>>>

> >>>>>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

> >>>>>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

> >>>>>>>

> >>>>>>> gas/

> >>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>>>>>

> >>>>>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

> >>>>>>>         "nosse4" entry.

> >>>>>>>

> >>>>>>> opcodes/

> >>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>>>>>

> >>>>>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

> >>>>>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

> >>>>>>>         CPU_ANY_SSE4_FLAGS entry.

> >>>>>>>         * i386-init.h: Re-generate.

> >>>>>>>

> >>>>>>

> >>>>>> OK.

> >>>>>>

> >>>>>> Thanks.

> >>>>>

> >>>>> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> >>>>> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> >>>>> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> >>>>> adds .sse4a and nosse4a.

> >>>>

> >>>> And where is it said that "nosse4" means only the Intel flavors? As

> >>>> said in the commit message of said change, to me the clear implication

> >>>> is that anything called SSE4* will get disabled.

> >>>>

> >>>

> >>> SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.

> >>> SSE4a from AMD is unrelated from Intel SSE4.

> >>

> >> Repeating my question then: Where is this being said? (Best imo

> >> would be to delete ".arch .nosse4" support then, eliminating

> >> the ambiguity.)

> >

> > We have both .sse4 and nosse4 which are aliases for SSE4.2.  Please

> > feel free to add documentation.

>

> If it's not documented, then it's not clear at all what the intention

> is. I'm certainly not going to add documentation saying something that

> I don't believe should be said. I.e. if I were to add documentation

> here, it'd say .nosse4 covers all three SSE4* variants (and it would

> then be a bug of the implementation that this isn't the case).


From gcc/config/i386/i386.opt:

msse4.1
Target Report Mask(ISA_SSE4_1) Var(ix86_isa_flags) Save
Support MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1 built-in functions and
code generation.

msse4.2
Target Report Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in
functions and code generation.

msse4
Target RejectNegative Report Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save
Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in
functions and code generation.

mno-sse4
Target RejectNegative Report InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save
Do not support SSE4.1 and SSE4.2 built-in functions and code generation.

SSE4 is for Intel SSE4 only.

> Just like for the MOVSX/MOVZX issue, I really dislike you making

> statements of things that were (apparently) never settled on.

>

> Jan


-- 
H.J.
Jan Beulich Feb. 17, 2020, 5:01 p.m. | #13
On 17.02.2020 17:52, H.J. Lu wrote:
> On Mon, Feb 17, 2020 at 7:49 AM Jan Beulich <jbeulich@suse.com> wrote:

>>

>> On 17.02.2020 16:44, H.J. Lu wrote:

>>> On Mon, Feb 17, 2020 at 7:32 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>

>>>> On 17.02.2020 16:30, H.J. Lu wrote:

>>>>> On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>>

>>>>>> On 16.02.2020 17:47, H.J. Lu wrote:

>>>>>>> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

>>>>>>>>

>>>>>>>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

>>>>>>>>>

>>>>>>>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

>>>>>>>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

>>>>>>>>>

>>>>>>>>> gas/

>>>>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>>>>>>>

>>>>>>>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

>>>>>>>>>         "nosse4" entry.

>>>>>>>>>

>>>>>>>>> opcodes/

>>>>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

>>>>>>>>>

>>>>>>>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

>>>>>>>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

>>>>>>>>>         CPU_ANY_SSE4_FLAGS entry.

>>>>>>>>>         * i386-init.h: Re-generate.

>>>>>>>>>

>>>>>>>>

>>>>>>>> OK.

>>>>>>>>

>>>>>>>> Thanks.

>>>>>>>

>>>>>>> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

>>>>>>> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

>>>>>>> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

>>>>>>> adds .sse4a and nosse4a.

>>>>>>

>>>>>> And where is it said that "nosse4" means only the Intel flavors? As

>>>>>> said in the commit message of said change, to me the clear implication

>>>>>> is that anything called SSE4* will get disabled.

>>>>>>

>>>>>

>>>>> SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.

>>>>> SSE4a from AMD is unrelated from Intel SSE4.

>>>>

>>>> Repeating my question then: Where is this being said? (Best imo

>>>> would be to delete ".arch .nosse4" support then, eliminating

>>>> the ambiguity.)

>>>

>>> We have both .sse4 and nosse4 which are aliases for SSE4.2.  Please

>>> feel free to add documentation.

>>

>> If it's not documented, then it's not clear at all what the intention

>> is. I'm certainly not going to add documentation saying something that

>> I don't believe should be said. I.e. if I were to add documentation

>> here, it'd say .nosse4 covers all three SSE4* variants (and it would

>> then be a bug of the implementation that this isn't the case).

> 

> From gcc/config/i386/i386.opt:

> 

> msse4.1

> Target Report Mask(ISA_SSE4_1) Var(ix86_isa_flags) Save

> Support MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1 built-in functions and

> code generation.

> 

> msse4.2

> Target Report Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save

> Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in

> functions and code generation.

> 

> msse4

> Target RejectNegative Report Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save

> Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in

> functions and code generation.

> 

> mno-sse4

> Target RejectNegative Report InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save

> Do not support SSE4.1 and SSE4.2 built-in functions and code generation.

> 

> SSE4 is for Intel SSE4 only.


Hmm, okay, that's gcc, not gas, but at least something.

Jan
H.J. Lu Feb. 17, 2020, 5:05 p.m. | #14
On Mon, Feb 17, 2020 at 9:01 AM Jan Beulich <jbeulich@suse.com> wrote:
>

> On 17.02.2020 17:52, H.J. Lu wrote:

> > On Mon, Feb 17, 2020 at 7:49 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>

> >> On 17.02.2020 16:44, H.J. Lu wrote:

> >>> On Mon, Feb 17, 2020 at 7:32 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>

> >>>> On 17.02.2020 16:30, H.J. Lu wrote:

> >>>>> On Mon, Feb 17, 2020 at 7:27 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>>

> >>>>>> On 16.02.2020 17:47, H.J. Lu wrote:

> >>>>>>> On Wed, Feb 12, 2020 at 9:18 AM H.J. Lu <hjl.tools@gmail.com> wrote:

> >>>>>>>>

> >>>>>>>> On Wed, Feb 12, 2020 at 9:08 AM Jan Beulich <jbeulich@suse.com> wrote:

> >>>>>>>>>

> >>>>>>>>> Since ".arch sse4a" enables SSE3 and earlier, disabling SSE3 should also

> >>>>>>>>> disable SSE4a. And as per its name, ".arch .nosse4" should also do so.

> >>>>>>>>>

> >>>>>>>>> gas/

> >>>>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>>>>>>>

> >>>>>>>>>         * config/tc-i386.c (cpu_noarch): Use CPU_ANY_SSE4_FLAGS in

> >>>>>>>>>         "nosse4" entry.

> >>>>>>>>>

> >>>>>>>>> opcodes/

> >>>>>>>>> 2020-02-XX  Jan Beulich  <jbeulich@suse.com>

> >>>>>>>>>

> >>>>>>>>>         * i386-gen.c (cpu_flag_init): Move CpuSSE4a from

> >>>>>>>>>         CPU_ANY_SSE_FLAGS entry to CPU_ANY_SSE3_FLAGS one. Add

> >>>>>>>>>         CPU_ANY_SSE4_FLAGS entry.

> >>>>>>>>>         * i386-init.h: Re-generate.

> >>>>>>>>>

> >>>>>>>>

> >>>>>>>> OK.

> >>>>>>>>

> >>>>>>>> Thanks.

> >>>>>>>

> >>>>>>> commit 7deea9aad8 changed nosse4 to include CpuSSE4a.  But AMD SSE4a is

> >>>>>>> a superset of SSE3 and Intel SSE4 is a superset of SSSE3.  Disable Intel

> >>>>>>> SSE4 shouldn't disable AMD SSE4a.  This patch restores nosse4.  It also

> >>>>>>> adds .sse4a and nosse4a.

> >>>>>>

> >>>>>> And where is it said that "nosse4" means only the Intel flavors? As

> >>>>>> said in the commit message of said change, to me the clear implication

> >>>>>> is that anything called SSE4* will get disabled.

> >>>>>>

> >>>>>

> >>>>> SSE4 refers to SSE4 from Intel, which includes SSE4.1 and SSE4.2.

> >>>>> SSE4a from AMD is unrelated from Intel SSE4.

> >>>>

> >>>> Repeating my question then: Where is this being said? (Best imo

> >>>> would be to delete ".arch .nosse4" support then, eliminating

> >>>> the ambiguity.)

> >>>

> >>> We have both .sse4 and nosse4 which are aliases for SSE4.2.  Please

> >>> feel free to add documentation.

> >>

> >> If it's not documented, then it's not clear at all what the intention

> >> is. I'm certainly not going to add documentation saying something that

> >> I don't believe should be said. I.e. if I were to add documentation

> >> here, it'd say .nosse4 covers all three SSE4* variants (and it would

> >> then be a bug of the implementation that this isn't the case).

> >

> > From gcc/config/i386/i386.opt:

> >

> > msse4.1

> > Target Report Mask(ISA_SSE4_1) Var(ix86_isa_flags) Save

> > Support MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1 built-in functions and

> > code generation.

> >

> > msse4.2

> > Target Report Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save

> > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in

> > functions and code generation.

> >

> > msse4

> > Target RejectNegative Report Mask(ISA_SSE4_2) Var(ix86_isa_flags) Save

> > Support MMX, SSE, SSE2, SSE3, SSSE3, SSE4.1 and SSE4.2 built-in

> > functions and code generation.

> >

> > mno-sse4

> > Target RejectNegative Report InverseMask(ISA_SSE4_1) Var(ix86_isa_flags) Save

> > Do not support SSE4.1 and SSE4.2 built-in functions and code generation.

> >

> > SSE4 is for Intel SSE4 only.

>

> Hmm, okay, that's gcc, not gas, but at least something.

>


Can you add a sentence for SSE4 to gas manual?

-- 
H.J.

Patch

--- a/gas/config/tc-i386.c
+++ b/gas/config/tc-i386.c
@@ -1180,7 +1180,7 @@  static const noarch_entry cpu_noarch[] =
   { STRING_COMMA_LEN ("nossse3"),  CPU_ANY_SSSE3_FLAGS },
   { STRING_COMMA_LEN ("nosse4.1"),  CPU_ANY_SSE4_1_FLAGS },
   { STRING_COMMA_LEN ("nosse4.2"),  CPU_ANY_SSE4_2_FLAGS },
-  { STRING_COMMA_LEN ("nosse4"),  CPU_ANY_SSE4_1_FLAGS },
+  { STRING_COMMA_LEN ("nosse4"),  CPU_ANY_SSE4_FLAGS },
   { STRING_COMMA_LEN ("noavx"),  CPU_ANY_AVX_FLAGS },
   { STRING_COMMA_LEN ("noavx2"),  CPU_ANY_AVX2_FLAGS },
   { STRING_COMMA_LEN ("noavx512f"), CPU_ANY_AVX512F_FLAGS },
--- a/opcodes/i386-gen.c
+++ b/opcodes/i386-gen.c
@@ -322,17 +322,19 @@  static initializer cpu_flag_init[] =
   { "CPU_ANY_MMX_FLAGS",
     "CPU_3DNOWA_FLAGS" },
   { "CPU_ANY_SSE_FLAGS",
-    "CPU_ANY_SSE2_FLAGS|CpuSSE|CpuSSE4a" },
+    "CPU_ANY_SSE2_FLAGS|CpuSSE" },
   { "CPU_ANY_SSE2_FLAGS",
     "CPU_ANY_SSE3_FLAGS|CpuSSE2" },
   { "CPU_ANY_SSE3_FLAGS",
-    "CPU_ANY_SSSE3_FLAGS|CpuSSE3" },
+    "CPU_ANY_SSSE3_FLAGS|CpuSSE3|CpuSSE4a" },
   { "CPU_ANY_SSSE3_FLAGS",
     "CPU_ANY_SSE4_1_FLAGS|CpuSSSE3" },
   { "CPU_ANY_SSE4_1_FLAGS",
     "CPU_ANY_SSE4_2_FLAGS|CpuSSE4_1" },
   { "CPU_ANY_SSE4_2_FLAGS",
     "CpuSSE4_2" },
+  { "CPU_ANY_SSE4_FLAGS",
+    "CPU_ANY_SSE4_1_FLAGS|CpuSSE4a" },
   { "CPU_ANY_AVX_FLAGS",
     "CPU_ANY_AVX2_FLAGS|CpuF16C|CpuFMA|CpuFMA4|CpuXOP|CpuAVX" },
   { "CPU_ANY_AVX2_FLAGS",