gdb: Don't let SIGWINCH interrupt waiting for remote target

Message ID 20200709150228.79385-1-andrew.burgess@embecosm.com
State New
Headers show
Series
  • gdb: Don't let SIGWINCH interrupt waiting for remote target
Related show

Commit Message

Andrew Burgess July 9, 2020, 3:02 p.m.
When connecting to a remote target a SIGWINCH will cause GDB to stop
waiting an complain about an interrupted system call, so:

  (gdb) target remote :1234
  # User resizes termainal...
  :1234: Interrupted system call.
  (gdb)

While investigating I also discovered that hitting Ctrl-C looks like
this:

  (gdb) target remote :1234
  # User hits Ctrl-C
  ^C:1234: Interrupted system call.
  (gdb) Quit
  (gdb)

Which doesn't seem great.

In this commit I wrap the code that waits for a connection in a loop
that will swallow any EINTR errors.  As a result SIGWINCH no longer
causes the connection loop to break, but nor does Ctrl-C.  So I add a
check of the QUIT flag, and now we get:

  (gdb) target remote :1234
  # User resizes termainal, and GDB continues to wait.

And also:

  (gdb) target remote :1234
  # User hits Ctrl-C
  ^CQuit
  (gdb)

Which seems better.

There is a test for this, which I've tried to make resilient against
timing issues, but I don't know how good it will actually turn out to
be.

gdb/ChangeLog:

	PR remote/26221
	* ser-tcp.c (wait_for_connection): Update comment.  Ignore EINTR
	errors, and handle QUIT from user.

gdb/testsuite/ChangeLog:

	PR remote/26221
	* gdb.server/connect-resize-quit.exp: New file.
---
 gdb/ChangeLog                                 |  6 ++
 gdb/ser-tcp.c                                 | 57 ++++++------
 gdb/testsuite/ChangeLog                       |  5 ++
 .../gdb.server/connect-resize-quit.exp        | 88 +++++++++++++++++++
 4 files changed, 131 insertions(+), 25 deletions(-)
 create mode 100644 gdb/testsuite/gdb.server/connect-resize-quit.exp

-- 
2.25.4

Comments

Simon Marchi July 9, 2020, 4 p.m. | #1
Wow thanks, I didn't expect a patch when I filed the bug report, and especially
not that fast :).

> gdb/testsuite/ChangeLog:

> 

> 	PR remote/26221

> 	* gdb.server/connect-resize-quit.exp: New file.


Does the new test really belong in `gdb.server`?  I would imagine that
this directory contains tests specific to GDBserver.  This is more a
test for GDB's remote target.

> +	# Open a connection to a remote target, but use a port number that is

> +	# unlikely to actually be in use.  We want this connection to block

> +	# waiting for a remote so we can test GDB's behaviour in this blocked

> +	# state.

> +	gdb_test_no_output "set tcp connect-timeout 2"

> +

> +	set timeout_count 0

> +	while { $timeout_count < 5} {

> +	    # Try connecting.  This should block waiting for a remote to appear.

> +	    send_gdb "target remote :0\n"


What happens when trying to connect to port 0? At first I thought it would try
to connect to a random port, but that wouldn't make sense (it wouldn't be very
useful).  I confused that with trying to _bind_ to port 0.

According to this:

https://unix.stackexchange.com/questions/180492/is-it-possible-to-connect-to-tcp-port-0

It will actually try to connect to port 0, but since there's no way of binding
to port 0, it will never connect.  Smart!

So it should be fine on Linux, I don't know about other platforms.  I could imagine
other OSes returning EINVAL or something like that.  We'll see.

> +

> +	    # Now send a signal to GDB and follow up by sending some other command.

> +	    set gdb_pid [exp_pid -i [board_info host fileid]]


GDB's spawn id is available as $gdb_spawn_id.  I don't know which is better between
that and `[board_info host fileid]`, but I usually get it from $gdb_spawn_id.

> +	    remote_exec host "kill -${sig} $gdb_pid"

> +	    send_gdb "echo xxyyzz\\n\n"

> +

> +	    # Now try to figure out what GDB did.

> +	    set got_timeout false

> +	    gdb_test_multiple "" "$testname" {

> +		-re "^target remote :0\r\n:0: Connection timed out\\..*$gdb_prompt $" {

> +		    # It's possible that we didn't send the signal quickly enough.

> +		    # Maybe the testing machine is heavily loaded?

> +		    set got_timeout true

> +		}

> +		-re "^target remote :0\r\n:0: Interrupted system call\\.\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {

> +		    fail "$gdb_test_name (interrupted by $sig)"

> +		    return

> +		}

> +		-re "^target remote :0\r\necho xxyyzz\\\\n\r.:0: Connection timed out.\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {

> +		    if { $sig == "WINCH" } {

> +			pass $gdb_test_name

> +		    } else {

> +			fail "$gdb_test_name (unexpected timeout)"

> +		    }

> +		    return

> +		}

> +		-re "^target remote :0\r\nQuit\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {

> +		    if { $sig == "INT" } {

> +			pass $gdb_test_name

> +		    } else {

> +			fail "$gdb_test_name (unexpected QUIT)"

> +		    }

> +		    return

> +		}

> +	    }


I don't remember, what happens if none of the -re match, will that generate
a FAIL?  I just want to be sure that if GDB outputs something else, we'll
see a failure.

Simon
Andrew Burgess July 10, 2020, 1:22 p.m. | #2
* Simon Marchi <simark@simark.ca> [2020-07-09 12:00:09 -0400]:

> Wow thanks, I didn't expect a patch when I filed the bug report, and especially

> not that fast :).

> 

> > gdb/testsuite/ChangeLog:

> > 

> > 	PR remote/26221

> > 	* gdb.server/connect-resize-quit.exp: New file.

> 

> Does the new test really belong in `gdb.server`?  I would imagine that

> this directory contains tests specific to GDBserver.  This is more a

> test for GDB's remote target.

> 

> > +	# Open a connection to a remote target, but use a port number that is

> > +	# unlikely to actually be in use.  We want this connection to block

> > +	# waiting for a remote so we can test GDB's behaviour in this blocked

> > +	# state.

> > +	gdb_test_no_output "set tcp connect-timeout 2"

> > +

> > +	set timeout_count 0

> > +	while { $timeout_count < 5} {

> > +	    # Try connecting.  This should block waiting for a remote to appear.

> > +	    send_gdb "target remote :0\n"

> 

> What happens when trying to connect to port 0? At first I thought it would try

> to connect to a random port, but that wouldn't make sense (it wouldn't be very

> useful).  I confused that with trying to _bind_ to port 0.

> 

> According to this:

> 

> https://unix.stackexchange.com/questions/180492/is-it-possible-to-connect-to-tcp-port-0

> 

> It will actually try to connect to port 0, but since there's no way of binding

> to port 0, it will never connect.  Smart!

> 

> So it should be fine on Linux, I don't know about other platforms.  I could imagine

> other OSes returning EINVAL or something like that.  We'll see.


I had a number of ideas here, in no particular order:

  - Pick a random port number and hope it doesn't connect to
    anything.  Maybe spot if the port does connect and try different
    port numbers.

  - Write an LD_PRELOAD library that replaces connect/select in order
    to force the return values and errno values I want.  This test
    would probably be Linux only though.

  - Use port 0.  Wasn't sure this would work, but knew 0 wasn't a
    valid port for someone to be listening on, tried it, and it seemed
    to work....

I'm happy to go with any of the above, or maybe a different plan
entirely if there are alternative suggestions....

> 

> > +

> > +	    # Now send a signal to GDB and follow up by sending some other command.

> > +	    set gdb_pid [exp_pid -i [board_info host fileid]]

> 

> GDB's spawn id is available as $gdb_spawn_id.  I don't know which is better between

> that and `[board_info host fileid]`, but I usually get it from

> $gdb_spawn_id.


Good point.  I just copied this code from elsewhere in the testsuite.
I've changed this test to use gdb_spawn_id.

> 

> > +	    remote_exec host "kill -${sig} $gdb_pid"

> > +	    send_gdb "echo xxyyzz\\n\n"

> > +

> > +	    # Now try to figure out what GDB did.

> > +	    set got_timeout false

> > +	    gdb_test_multiple "" "$testname" {

> > +		-re "^target remote :0\r\n:0: Connection timed out\\..*$gdb_prompt $" {

> > +		    # It's possible that we didn't send the signal quickly enough.

> > +		    # Maybe the testing machine is heavily loaded?

> > +		    set got_timeout true

> > +		}

> > +		-re "^target remote :0\r\n:0: Interrupted system call\\.\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {

> > +		    fail "$gdb_test_name (interrupted by $sig)"

> > +		    return

> > +		}

> > +		-re "^target remote :0\r\necho xxyyzz\\\\n\r.:0: Connection timed out.\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {

> > +		    if { $sig == "WINCH" } {

> > +			pass $gdb_test_name

> > +		    } else {

> > +			fail "$gdb_test_name (unexpected timeout)"

> > +		    }

> > +		    return

> > +		}

> > +		-re "^target remote :0\r\nQuit\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {

> > +		    if { $sig == "INT" } {

> > +			pass $gdb_test_name

> > +		    } else {

> > +			fail "$gdb_test_name (unexpected QUIT)"

> > +		    }

> > +		    return

> > +		}

> > +	    }

> 

> I don't remember, what happens if none of the -re match, will that generate

> a FAIL?  I just want to be sure that if GDB outputs something else, we'll

> see a failure.


gdb_test_multiple adds a bunch of default patterns, including a
generic, you reached the prompt pattern that results in an error.  I
tested this by deliberately breaking all my patterns and the test
finishes instantly (no timeouts) and registers some failures.

Thanks,
Andrew

Patch

diff --git a/gdb/ser-tcp.c b/gdb/ser-tcp.c
index 7dd903dfaad..8fc19b959fc 100644
--- a/gdb/ser-tcp.c
+++ b/gdb/ser-tcp.c
@@ -82,10 +82,12 @@  static unsigned int tcp_retry_limit = 15;
 
 #define POLL_INTERVAL 5
 
-/* Helper function to wait a while.  If SOCK is not -1, wait on its
-   file descriptor.  Otherwise just wait on a timeout, updating
-   *POLLS.  Returns -1 on timeout or interrupt, otherwise the value of
-   select.  */
+/* Helper function to wait a while.  If SOCK is not -1, wait on its file
+   descriptor.  Otherwise just wait on a timeout, updating *POLLS.
+
+   Returns -1 on timeout, otherwise the value of select.  Interrupts
+   (EINTR) are swallowed within this function, however, this function will
+   throw an error if the user interrupts the wait.  */
 
 static int
 wait_for_connect (int sock, unsigned int *polls)
@@ -122,29 +124,34 @@  wait_for_connect (int sock, unsigned int *polls)
       t.tv_usec = 0;
     }
 
-  if (sock >= 0)
+  do
     {
-      fd_set rset, wset, eset;
-
-      FD_ZERO (&rset);
-      FD_SET (sock, &rset);
-      wset = rset;
-      eset = rset;
-
-      /* POSIX systems return connection success or failure by signalling
-	 wset.  Windows systems return success in wset and failure in
-	 eset.
-
-	 We must call select here, rather than gdb_select, because
-	 the serial structure has not yet been initialized - the
-	 MinGW select wrapper will not know that this FD refers
-	 to a socket.  */
-      n = select (sock + 1, &rset, &wset, &eset, &t);
+      QUIT;
+      if (sock >= 0)
+	{
+	  fd_set rset, wset, eset;
+
+	  FD_ZERO (&rset);
+	  FD_SET (sock, &rset);
+	  wset = rset;
+	  eset = rset;
+
+	  /* POSIX systems return connection success or failure by signalling
+	     wset.  Windows systems return success in wset and failure in
+	     eset.
+
+	     We must call select here, rather than gdb_select, because
+	     the serial structure has not yet been initialized - the
+	     MinGW select wrapper will not know that this FD refers
+	     to a socket.  */
+	  n = select (sock + 1, &rset, &wset, &eset, &t);
+	}
+      else
+	/* Use gdb_select here, since we have no file descriptors, and on
+	   Windows, plain select doesn't work in that case.  */
+	n = gdb_select (0, NULL, NULL, NULL, &t);
     }
-  else
-    /* Use gdb_select here, since we have no file descriptors, and on
-       Windows, plain select doesn't work in that case.  */
-    n = gdb_select (0, NULL, NULL, NULL, &t);
+  while (n == -1 && errno == EINTR);
 
   /* If we didn't time out, only count it as one poll.  */
   if (n > 0 || *polls < POLL_INTERVAL)
diff --git a/gdb/testsuite/gdb.server/connect-resize-quit.exp b/gdb/testsuite/gdb.server/connect-resize-quit.exp
new file mode 100644
index 00000000000..cd5db4a3675
--- /dev/null
+++ b/gdb/testsuite/gdb.server/connect-resize-quit.exp
@@ -0,0 +1,88 @@ 
+# Copyright 2020 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+# Check that a terminal resize while waiting to connect to a target
+# doesn't cause the wait to be abandoned.  Then check that the wait
+# can be interrupted with SIGINT.
+
+proc test_signal { sig } {
+    global gdb_prompt
+
+    with_test_prefix "SIG${sig}" {
+
+	set testname "waiting on target remote"
+
+	gdb_start
+
+	# Open a connection to a remote target, but use a port number that is
+	# unlikely to actually be in use.  We want this connection to block
+	# waiting for a remote so we can test GDB's behaviour in this blocked
+	# state.
+	gdb_test_no_output "set tcp connect-timeout 2"
+
+	set timeout_count 0
+	while { $timeout_count < 5} {
+	    # Try connecting.  This should block waiting for a remote to appear.
+	    send_gdb "target remote :0\n"
+
+	    # Now send a signal to GDB and follow up by sending some other command.
+	    set gdb_pid [exp_pid -i [board_info host fileid]]
+	    remote_exec host "kill -${sig} $gdb_pid"
+	    send_gdb "echo xxyyzz\\n\n"
+
+	    # Now try to figure out what GDB did.
+	    set got_timeout false
+	    gdb_test_multiple "" "$testname" {
+		-re "^target remote :0\r\n:0: Connection timed out\\..*$gdb_prompt $" {
+		    # It's possible that we didn't send the signal quickly enough.
+		    # Maybe the testing machine is heavily loaded?
+		    set got_timeout true
+		}
+		-re "^target remote :0\r\n:0: Interrupted system call\\.\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {
+		    fail "$gdb_test_name (interrupted by $sig)"
+		    return
+		}
+		-re "^target remote :0\r\necho xxyyzz\\\\n\r.:0: Connection timed out.\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {
+		    if { $sig == "WINCH" } {
+			pass $gdb_test_name
+		    } else {
+			fail "$gdb_test_name (unexpected timeout)"
+		    }
+		    return
+		}
+		-re "^target remote :0\r\nQuit\r\n$gdb_prompt.*xxyyzz.*$gdb_prompt $" {
+		    if { $sig == "INT" } {
+			pass $gdb_test_name
+		    } else {
+			fail "$gdb_test_name (unexpected QUIT)"
+		    }
+		    return
+		}
+	    }
+
+	    if { ! $got_timeout } {
+		# Some other unknown error.
+		return
+	    }
+
+	    incr timeout_count
+	}
+
+	unresolved "$testname (too many timeouts)"
+    }
+}
+
+test_signal WINCH
+test_signal INT