PR libgfortran/83649 Chunk large reads and writes

Message ID 1514899317-11668-1-git-send-email-blomqvist.janne@gmail.com
State New
Headers show
Series
  • PR libgfortran/83649 Chunk large reads and writes
Related show

Commit Message

Janne Blomqvist Jan. 2, 2018, 1:21 p.m.
It turns out that Linux never reads or writes more than 2147479552
bytes in a single syscall. For writes this is not a problem as
libgfortran already contains a loop around write() to handle short
writes. But for reads we cannot do this, since then read will hang if
we have a short read when reading from the terminal.  Also, there are
reports that macOS fails I/O's larger than 2 GB.  Thus, to work around
these issues do large reads/writes in chunks.

The testcase from the PR

program largewr
  integer(kind=1) :: a(2_8**31+1)
  a = 0
  a(size(a, kind=8)) = 1
  open(10, file="largewr.dat", access="stream", form="unformatted")
  write (10) a
  close(10)
  a(size(a, kind=8)) = 2
  open(10, file="largewr.dat", access="stream", form="unformatted")
  read (10) a
  if (a(size(a, kind=8)) == 1) then
    print *, "All is well"
  else
    print *, "Oh no"
  end if
end program largewr

fails on trunk but works with the patch.

Regtested on x86_64-pc-linux-gnu, committed to trunk.

libgfortran/ChangeLog:

2018-01-02  Janne Blomqvist  <jb@gcc.gnu.org>

	PR libgfortran/83649
	* io/unix.c (MAX_CHUNK): New define.
	(raw_read): For reads larger than MAX_CHUNK, loop.
	(raw_write): Write no more than MAX_CHUNK bytes per iteration.
---
 libgfortran/io/unix.c | 50 ++++++++++++++++++++++++++++++++++++++++++--------
 1 file changed, 42 insertions(+), 8 deletions(-)

-- 
2.7.4

Comments

Thomas Koenig Jan. 2, 2018, 2:15 p.m. | #1
Hi Janne,

> +#define MAX_CHUNK 2147479552


Two things to discuss:

Should this also be backported to the 7-branch?

Also, we could set the default subrecord length for unformatted
I/O to the same value. Save a syscall :-)

What do you think?

Regards

	Thomas
Janne Blomqvist Jan. 2, 2018, 8:33 p.m. | #2
On Tue, Jan 2, 2018 at 4:15 PM, Thomas Koenig <tkoenig@netcologne.de> wrote:
> Hi Janne,

>

>> +#define MAX_CHUNK 2147479552

>

>

> Two things to discuss:

>

> Should this also be backported to the 7-branch?


Probably yes. Is the 6 branch still active? If so, maybe that one as well.

> Also, we could set the default subrecord length for unformatted

> I/O to the same value. Save a syscall :-)

>

> What do you think?


I played around a bit with slightly modified dd test scripts found at
http://blog.tdg5.com/tuning-dd-block-size/ and it seems the optimal
transfer size is actually quite small, bigger isn't necessarily
better.  On an NVME SSD the write performance basically flatlined
above 2048 bytes, for reads the best was around 64kB at 10.5 GB/s
after which it slowly dropped until reaching about 7.1 GB/s at a 512MB
block size.  For copying from /dev/zero -> /dev/null I got the best
speed at a block size of 128kB. That being said, I suspect the reason
is that in this kind of copy operation small blocks win since they fit
into the CPU cache. So it might not matter for some application that
wants to read or write a large file to/from disk.


-- 
Janne Blomqvist

Patch

diff --git a/libgfortran/io/unix.c b/libgfortran/io/unix.c
index a07a3c9..7a982b3 100644
--- a/libgfortran/io/unix.c
+++ b/libgfortran/io/unix.c
@@ -292,18 +292,49 @@  raw_flush (unix_stream *s  __attribute__ ((unused)))
   return 0;
 }
 
+/* Write/read at most 2 GB - 4k chunks at a time. Linux never reads or
+   writes more than this, and there are reports that macOS fails for
+   larger than 2 GB as well.  */
+#define MAX_CHUNK 2147479552
+
 static ssize_t
 raw_read (unix_stream *s, void *buf, ssize_t nbyte)
 {
   /* For read we can't do I/O in a loop like raw_write does, because
      that will break applications that wait for interactive I/O.  We
-     still can loop around EINTR, though.  */
-  while (true)
+     still can loop around EINTR, though.  This however causes a
+     problem for large reads which must be chunked, see comment above.
+     So assume that if the size is larger than the chunk size, we're
+     reading from a file and not the terminal.  */
+  if (nbyte <= MAX_CHUNK)
     {
-      ssize_t trans = read (s->fd, buf, nbyte);
-      if (trans == -1 && errno == EINTR)
-	continue;
-      return trans;
+      while (true)
+	{
+	  ssize_t trans = read (s->fd, buf, nbyte);
+	  if (trans == -1 && errno == EINTR)
+	    continue;
+	  return trans;
+	}
+    }
+  else
+    {
+      ssize_t bytes_left = nbyte;
+      char *buf_st = buf;
+      while (bytes_left > 0)
+	{
+	  ssize_t to_read = bytes_left < MAX_CHUNK ? bytes_left: MAX_CHUNK;
+	  ssize_t trans = read (s->fd, buf_st, to_read);
+	  if (trans == -1)
+	    {
+	      if (errno == EINTR)
+		continue;
+	      else
+		return trans;
+	    }
+	  buf_st += trans;
+	  bytes_left -= trans;
+	}
+      return nbyte - bytes_left;
     }
 }
 
@@ -317,10 +348,13 @@  raw_write (unix_stream *s, const void *buf, ssize_t nbyte)
   buf_st = (char *) buf;
 
   /* We must write in a loop since some systems don't restart system
-     calls in case of a signal.  */
+     calls in case of a signal.  Also some systems might fail outright
+     if we try to write more than 2 GB in a single syscall, so chunk
+     up large writes.  */
   while (bytes_left > 0)
     {
-      trans = write (s->fd, buf_st, bytes_left);
+      ssize_t to_write = bytes_left < MAX_CHUNK ? bytes_left: MAX_CHUNK;
+      trans = write (s->fd, buf_st, to_write);
       if (trans == -1)
 	{
 	  if (errno == EINTR)