[gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

Message ID 20190924131612.GA24708@delia
State New
Headers show
Series
  • [gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs
Related show

Commit Message

Tom de Vries Sept. 24, 2019, 1:16 p.m.
Hi,

On my openSUSE Leap 15.1 x86_64 Skylake system with the default (4.12) kernel,
I run into:
...
FAIL: gdb.base/gcore.exp: corefile restored all registers
...

The problem is that there's a difference in the mxcsr register value before
and after the gcore command:
...
- mxcsr          0x0                 [ ]
+ mxcsr          0x400440            [ DAZ OM ]
...

This can be traced back to amd64_linux_nat_target::fetch_registers, where
xstateregs is partially initialized by the ptrace call:
...
          char xstateregs[X86_XSTATE_MAX_SIZE];
          struct iovec iov;

          amd64_collect_xsave (regcache, -1, xstateregs, 0);
          iov.iov_base = xstateregs;
          iov.iov_len = sizeof (xstateregs);
          if (ptrace (PTRACE_GETREGSET, tid,
                      (unsigned int) NT_X86_XSTATE, (long) &iov) < 0)
            perror_with_name (_("Couldn't get extended state status"));

          amd64_supply_xsave (regcache, -1, xstateregs);
...
after which amd64_supply_xsave is called.

The amd64_supply_xsave call is supposed to only use initialized parts of
xstateregs, but due to a kernel bug on intel skylake (fixed from 4.14 onwards
by commit 0852b374173b "x86/fpu: Add FPU state copying quirk to handle XRSTOR
failure on Intel Skylake CPUs") it can happen that the mxcsr part of
xstateregs is not initialized, while amd64_supply_xsave expects it to be
initialized, which explains the FAIL mentioned above.

Fix the undetermined behaviour by initializing xstateregs before calling
ptrace, which makes sure we get a 0x0 for mxcsr when this kernel bug occurs,
and which also happens to fix the FAIL.

Furthermore, add an xfail for this FAIL which triggers the same kernel bug:
...
FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: \
  check new value of MXCSR is still in place
...

Both FAILs pass when using a 5.3 kernel instead on the system mentioned above.

Tested on x86_64-linux.

OK for trunk?

Thanks,
- Tom

[gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

gdb/ChangeLog:

2019-09-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/23815
	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):
	Initialize xstateregs before ptrace PTRACE_GETREGSET call.

gdb/testsuite/ChangeLog:

2019-09-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/24598
	* gdb.arch/amd64-init-x87-values.exp: Add xfail.

---
 gdb/amd64-linux-nat.c                            |  1 +
 gdb/testsuite/gdb.arch/amd64-init-x87-values.exp | 19 +++++++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

Comments

Tom Tromey Sept. 24, 2019, 8:06 p.m. | #1
>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:


Tom> [gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

Tom> gdb/ChangeLog:

Tom> 2019-09-24  Tom de Vries  <tdevries@suse.de>

Tom> 	PR gdb/23815
Tom> 	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):
Tom> 	Initialize xstateregs before ptrace PTRACE_GETREGSET call.

Thanks for the patch.

I think this is reasonable, but it could use a comment before the memset
explaining why it is there.  Maybe some future gdb hacker will want to
remove the memset, and a comment would explain why not to, or when it
would be ok to.

thanks,
Tom
Tom de Vries Sept. 25, 2019, 7:07 a.m. | #2
On 24-09-19 22:06, Tom Tromey wrote:
>>>>>> "Tom" == Tom de Vries <tdevries@suse.de> writes:

> 

> Tom> [gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

> 

> Tom> gdb/ChangeLog:

> 

> Tom> 2019-09-24  Tom de Vries  <tdevries@suse.de>

> 

> Tom> 	PR gdb/23815

> Tom> 	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):

> Tom> 	Initialize xstateregs before ptrace PTRACE_GETREGSET call.

> 

> Thanks for the patch.

> 

> I think this is reasonable, but it could use a comment before the memset

> explaining why it is there.  Maybe some future gdb hacker will want to

> remove the memset, and a comment would explain why not to, or when it

> would be ok to.


Hi Tom,

thanks for the review.

Added comment and committed as below.

Thanks,
- Tom
[gdb/tdep] Handle mxcsr kernel bug on Intel Skylake CPUs

On my openSUSE Leap 15.1 x86_64 Skylake system with the default (4.12) kernel,
I run into:
...
FAIL: gdb.base/gcore.exp: corefile restored all registers
...

The problem is that there's a difference in the mxcsr register value before
and after the gcore command:
...
- mxcsr          0x0                 [ ]
+ mxcsr          0x400440            [ DAZ OM ]
...

This can be traced back to amd64_linux_nat_target::fetch_registers, where
xstateregs is partially initialized by the ptrace call:
...
          char xstateregs[X86_XSTATE_MAX_SIZE];
          struct iovec iov;

          amd64_collect_xsave (regcache, -1, xstateregs, 0);
          iov.iov_base = xstateregs;
          iov.iov_len = sizeof (xstateregs);
          if (ptrace (PTRACE_GETREGSET, tid,
                      (unsigned int) NT_X86_XSTATE, (long) &iov) < 0)
            perror_with_name (_("Couldn't get extended state status"));

          amd64_supply_xsave (regcache, -1, xstateregs);
...
after which amd64_supply_xsave is called.

The amd64_supply_xsave call is supposed to only use initialized parts of
xstateregs, but due to a kernel bug on intel skylake (fixed from 4.14 onwards
by commit 0852b374173b "x86/fpu: Add FPU state copying quirk to handle XRSTOR
failure on Intel Skylake CPUs") it can happen that the mxcsr part of
xstateregs is not initialized, while amd64_supply_xsave expects it to be
initialized, which explains the FAIL mentioned above.

Fix the undetermined behaviour by initializing xstateregs before calling
ptrace, which makes sure we get a 0x0 for mxcsr when this kernel bug occurs,
and which also happens to fix the FAIL.

Furthermore, add an xfail for this FAIL which triggers the same kernel bug:
...
FAIL: gdb.arch/amd64-init-x87-values.exp: check_setting_mxcsr_before_enable: \
  check new value of MXCSR is still in place
...

Both FAILs pass when using a 5.3 kernel instead on the system mentioned above.

Tested on x86_64-linux.

gdb/ChangeLog:

2019-09-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/23815
	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):
	Initialize xstateregs before ptrace PTRACE_GETREGSET call.

gdb/testsuite/ChangeLog:

2019-09-24  Tom de Vries  <tdevries@suse.de>

	PR gdb/24598
	* gdb.arch/amd64-init-x87-values.exp: Add xfail.

---
 gdb/ChangeLog                                    |  6 ++++++
 gdb/amd64-linux-nat.c                            |  6 ++++++
 gdb/testsuite/ChangeLog                          |  5 +++++
 gdb/testsuite/gdb.arch/amd64-init-x87-values.exp | 19 +++++++++++++++++--
 4 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/gdb/ChangeLog b/gdb/ChangeLog
index 77aab76492..ee53e9c00a 100644
--- a/gdb/ChangeLog
+++ b/gdb/ChangeLog
@@ -1,3 +1,9 @@
+2019-09-24  Tom de Vries  <tdevries@suse.de>
+
+	PR gdb/23815
+	* amd64-linux-nat.c (amd64_linux_nat_target::fetch_registers):
+	Initialize xstateregs before ptrace PTRACE_GETREGSET call.
+
 2019-09-23  Dimitar Dimitrov  <dimitar@dinux.eu>
 
 	* NEWS: Mention new simulator port for PRU.
diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
index 4f1c98a0d1..d0328b677d 100644
--- a/gdb/amd64-linux-nat.c
+++ b/gdb/amd64-linux-nat.c
@@ -238,6 +238,12 @@ amd64_linux_nat_target::fetch_registers (struct regcache *regcache, int regnum)
 	  char xstateregs[X86_XSTATE_MAX_SIZE];
 	  struct iovec iov;
 
+	  /* Pre-4.14 kernels have a bug (fixed by commit 0852b374173b
+	     "x86/fpu: Add FPU state copying quirk to handle XRSTOR failure on
+	     Intel Skylake CPUs") that sometimes causes the mxcsr location in
+	     xstateregs not to be copied by PTRACE_GETREGSET.  Make sure that
+	     the location is at least initialized with a defined value.  */
+	  memset (xstateregs, 0, sizeof (xstateregs));
 	  iov.iov_base = xstateregs;
 	  iov.iov_len = sizeof (xstateregs);
 	  if (ptrace (PTRACE_GETREGSET, tid,
diff --git a/gdb/testsuite/ChangeLog b/gdb/testsuite/ChangeLog
index 37e323f747..706c5da420 100644
--- a/gdb/testsuite/ChangeLog
+++ b/gdb/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2019-09-24  Tom de Vries  <tdevries@suse.de>
+
+	PR gdb/24598
+	* gdb.arch/amd64-init-x87-values.exp: Add xfail.
+
 2019-09-22  Tom de Vries  <tdevries@suse.de>
 
 	* gdb.base/restore.exp: Allow register variables to be optimized out at
diff --git a/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp b/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
index cdf92dcd37..5fd18dbb79 100644
--- a/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
+++ b/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
@@ -116,7 +116,7 @@ proc_with_prefix check_x87_regs_around_init {} {
 # nop that does not enable any FP features).  Finally check that the
 # mxcsr register still has the value we set.
 proc_with_prefix check_setting_mxcsr_before_enable {} {
-    global binfile
+    global binfile gdb_prompt
 
     clean_restart ${binfile}
 
@@ -127,7 +127,22 @@ proc_with_prefix check_setting_mxcsr_before_enable {} {
 
     gdb_test_no_output "set \$mxcsr=0x9f80" "set a new value for MXCSR"
     gdb_test "stepi" "fwait" "step forward one instruction for mxcsr test"
-    gdb_test "p/x \$mxcsr" " = 0x9f80" "check new value of MXCSR is still in place"
+
+    set test "check new value of MXCSR is still in place"
+    set pass_pattern " = 0x9f80"
+    # Pre-4.14 kernels have a bug (fixed by commit 0852b374173b "x86/fpu:
+    # Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake
+    # CPUs") that causes mxcsr not to be copied, in which case we get 0 instead of
+    # the just saved value.
+    set xfail_pattern " = 0x0"
+    gdb_test_multiple "p/x \$mxcsr" $test {
+	-re "\[\r\n\]*(?:$pass_pattern)\[\r\n\]+$gdb_prompt $" {
+	    pass $test
+	}
+	-re "\[\r\n\]*(?:$xfail_pattern)\[\r\n\]+$gdb_prompt $" {
+	    xfail $test
+	}
+    }
 }
 
 # Start the test file, all FP features will be disabled.  Set new

Patch

diff --git a/gdb/amd64-linux-nat.c b/gdb/amd64-linux-nat.c
index 4f1c98a0d1..984b3d7c0f 100644
--- a/gdb/amd64-linux-nat.c
+++ b/gdb/amd64-linux-nat.c
@@ -238,6 +238,7 @@  amd64_linux_nat_target::fetch_registers (struct regcache *regcache, int regnum)
 	  char xstateregs[X86_XSTATE_MAX_SIZE];
 	  struct iovec iov;
 
+	  memset (xstateregs, 0, sizeof (xstateregs));
 	  iov.iov_base = xstateregs;
 	  iov.iov_len = sizeof (xstateregs);
 	  if (ptrace (PTRACE_GETREGSET, tid,
diff --git a/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp b/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
index cdf92dcd37..5fd18dbb79 100644
--- a/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
+++ b/gdb/testsuite/gdb.arch/amd64-init-x87-values.exp
@@ -116,7 +116,7 @@  proc_with_prefix check_x87_regs_around_init {} {
 # nop that does not enable any FP features).  Finally check that the
 # mxcsr register still has the value we set.
 proc_with_prefix check_setting_mxcsr_before_enable {} {
-    global binfile
+    global binfile gdb_prompt
 
     clean_restart ${binfile}
 
@@ -127,7 +127,22 @@  proc_with_prefix check_setting_mxcsr_before_enable {} {
 
     gdb_test_no_output "set \$mxcsr=0x9f80" "set a new value for MXCSR"
     gdb_test "stepi" "fwait" "step forward one instruction for mxcsr test"
-    gdb_test "p/x \$mxcsr" " = 0x9f80" "check new value of MXCSR is still in place"
+
+    set test "check new value of MXCSR is still in place"
+    set pass_pattern " = 0x9f80"
+    # Pre-4.14 kernels have a bug (fixed by commit 0852b374173b "x86/fpu:
+    # Add FPU state copying quirk to handle XRSTOR failure on Intel Skylake
+    # CPUs") that causes mxcsr not to be copied, in which case we get 0 instead of
+    # the just saved value.
+    set xfail_pattern " = 0x0"
+    gdb_test_multiple "p/x \$mxcsr" $test {
+	-re "\[\r\n\]*(?:$pass_pattern)\[\r\n\]+$gdb_prompt $" {
+	    pass $test
+	}
+	-re "\[\r\n\]*(?:$xfail_pattern)\[\r\n\]+$gdb_prompt $" {
+	    xfail $test
+	}
+    }
 }
 
 # Start the test file, all FP features will be disabled.  Set new