[1/5] glibc: Perform rseq(2) registration at C startup and thread creation (v10)

Message ID 20190503184219.19266-2-mathieu.desnoyers@efficios.com
State New
Headers show
Series
  • Restartable Sequences support for glibc 2.30
Related show

Commit Message

Mathieu Desnoyers May 3, 2019, 6:42 p.m.
Register rseq(2) TLS for each thread (including main), and unregister
for each thread (excluding main). "rseq" stands for Restartable
Sequences.

See the rseq(2) man page proposed here:
  https://lkml.org/lkml/2018/9/19/647

This patch is based on glibc-2.29. The rseq(2) system call was merged
into Linux 4.18.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: Rich Felker <dalias@libc.org>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
Changes since v1:
- Move __rseq_refcount to an extra field at the end of __rseq_abi to
  eliminate one symbol.

  All libraries/programs which try to register rseq (glibc,
  early-adopter applications, early-adopter libraries) should use the
  rseq refcount. It becomes part of the ABI within a user-space
  process, but it's not part of the ABI shared with the kernel per se.

- Restructure how this code is organized so glibc keeps building on
  non-Linux targets.

- Use non-weak symbol for __rseq_abi.

- Move rseq registration/unregistration implementation into its own
  nptl/rseq.c compile unit.

- Move __rseq_abi symbol under GLIBC_2.29.

Changes since v2:
- Move __rseq_refcount to its own symbol, which is less ugly than
  trying to play tricks with the rseq uapi.
- Move __rseq_abi from nptl to csu (C start up), so it can be used
  across glibc, including memory allocator and sched_getcpu(). The
  __rseq_refcount symbol is kept in nptl, because there is no reason
  to use it elsewhere in glibc.

Changes since v3:
- Set __rseq_refcount TLS to 1 on register/set to 0 on unregister
  because glibc is the first/last user.
- Unconditionally register/unregister rseq at thread start/exit, because
  glibc is the first/last user.
- Add missing abilist items.
- Rebase on glibc master commit a502c5294.
- Add NEWS entry.

Changes since v4:
- Do not use "weak" symbols for __rseq_abi and __rseq_refcount. Based on
  "System V Application Binary Interface", weak only affects the link
  editor, not the dynamic linker.
- Install a new sys/rseq.h system header on Linux, which contains the
  RSEQ_SIG definition, __rseq_abi declaration and __rseq_refcount
  declaration. Move those definition/declarations from rseq-internal.h
  to the installed sys/rseq.h header.
- Considering that rseq is only available on Linux, move csu/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c.
- Move __rseq_refcount from nptl/rseq.c to
  sysdeps/unix/sysv/linux/rseq-sym.c, so it is only defined on Linux.
- Move both ABI definitions for __rseq_abi and __rseq_refcount to
  sysdeps/unix/sysv/linux/Versions, so they only appear on Linux.
- Document __rseq_abi and __rseq_refcount volatile.
- Document the RSEQ_SIG signature define.
- Move registration functions from rseq.c to rseq-internal.h static
  inline functions. Introduce empty stubs in misc/rseq-internal.h,
  which can be overridden by architecture code in
  sysdeps/unix/sysv/linux/rseq-internal.h.
- Rename __rseq_register_current_thread and __rseq_unregister_current_thread
  to rseq_register_current_thread and rseq_unregister_current_thread,
  now that those are only visible as internal static inline functions.
- Invoke rseq_register_current_thread() from libc-start.c LIBC_START_MAIN
  rather than nptl init, so applications not linked against
  libpthread.so have rseq registered for their main() thread. Note that
  it is invoked separately for SHARED and !SHARED builds.

Changes since v5:
- Replace __rseq_refcount by __rseq_lib_abi, which contains two
  uint32_t: register_state and refcount. The "register_state" field
  allows inhibiting rseq registration from signal handlers nested on top
  of glibc registration and occuring after rseq unregistration by glibc.
- Introduce enum rseq_register_state, which contains the states allowed
  for the struct rseq_lib_abi register_state field.

Changes since v6:
- Introduce bits/rseq.h to define RSEQ_SIG for each architecture.
  The generic bits/rseq.h does not define RSEQ_SIG, meaning that each
  architecture implementing rseq needs to implement bits/rseq.h.
- Rename enum item RSEQ_REGISTER_NESTED to RSEQ_REGISTER_ONGOING.
- Port to glibc-2.29.

Changes since v7:
- Remove __rseq_lib_abi symbol, including refcount and register_state
  fields.
- Remove reference counting and nested signals handling from
  registration/unregistration functions.
- Introduce new __rseq_handled exported symbol, which is set to 1
  by glibc on C startup when it handles restartable sequences.
  This allows glibc to coexist with early adopter libraries and
  applications wishing to register restartable sequences when it
  is not handled by glibc.
- Introduce rseq_init (), which sets __rseq_handled to 1 from
  C startup.
- Update NEWS entry.
- Update comments at the beginning of new files.
- Registration depends on both __NR_rseq and RSEQ_SIG.
- Remove ARM, powerpc, MIPS RSEQ_SIG until we agree with maintainers
  on the signature choice.
- Update x86, s390 RSEQ_SIG based on discussion with arch maintainers.
- Remove rseq-internal.h from headers list of misc/Makefile, so it
  it not installed by make install.

Changes since v8:
- Introduce RSEQ_SIG_CODE and RSEQ_SIG_DATA on aarch64 to handle
  compiling with -mbig-endian.

Changes since v9:
- Update Changelog.
- Remove unneeded new file comment header newlines.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>

CC: Carlos O'Donell <carlos@redhat.com>
CC: Florian Weimer <fweimer@redhat.com>
CC: Joseph Myers <joseph@codesourcery.com>
CC: Szabolcs Nagy <szabolcs.nagy@arm.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: Ben Maurer <bmaurer@fb.com>
CC: Peter Zijlstra <peterz@infradead.org>
CC: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
CC: Boqun Feng <boqun.feng@gmail.com>
CC: Will Deacon <will.deacon@arm.com>
CC: Dave Watson <davejwatson@fb.com>
CC: Paul Turner <pjt@google.com>
CC: Rich Felker <dalias@libc.org>
CC: libc-alpha@sourceware.org
CC: linux-kernel@vger.kernel.org
CC: linux-api@vger.kernel.org
---
 ChangeLog                                     | 47 ++++++++++
 NEWS                                          | 15 ++++
 csu/libc-start.c                              | 14 ++-
 misc/rseq-internal.h                          | 38 ++++++++
 nptl/pthread_create.c                         |  9 ++
 sysdeps/unix/sysv/linux/Makefile              |  4 +-
 sysdeps/unix/sysv/linux/Versions              |  4 +
 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h   | 43 +++++++++
 sysdeps/unix/sysv/linux/aarch64/libc.abilist  |  2 +
 sysdeps/unix/sysv/linux/alpha/libc.abilist    |  2 +
 sysdeps/unix/sysv/linux/arm/libc.abilist      |  2 +
 sysdeps/unix/sysv/linux/bits/rseq.h           | 29 ++++++
 sysdeps/unix/sysv/linux/csky/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/hppa/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/i386/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/ia64/libc.abilist     |  2 +
 .../sysv/linux/m68k/coldfire/libc.abilist     |  2 +
 .../unix/sysv/linux/m68k/m680x0/libc.abilist  |  2 +
 .../unix/sysv/linux/microblaze/libc.abilist   |  2 +
 .../sysv/linux/mips/mips32/fpu/libc.abilist   |  2 +
 .../sysv/linux/mips/mips32/nofpu/libc.abilist |  2 +
 .../sysv/linux/mips/mips64/n32/libc.abilist   |  2 +
 .../sysv/linux/mips/mips64/n64/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/nios2/libc.abilist    |  2 +
 .../linux/powerpc/powerpc32/fpu/libc.abilist  |  2 +
 .../powerpc/powerpc32/nofpu/libc.abilist      |  2 +
 .../linux/powerpc/powerpc64/be/libc.abilist   |  2 +
 .../linux/powerpc/powerpc64/le/libc.abilist   |  2 +
 .../unix/sysv/linux/riscv/rv64/libc.abilist   |  2 +
 sysdeps/unix/sysv/linux/rseq-internal.h       | 88 +++++++++++++++++++
 sysdeps/unix/sysv/linux/rseq-sym.c            | 63 +++++++++++++
 sysdeps/unix/sysv/linux/s390/bits/rseq.h      | 30 +++++++
 .../unix/sysv/linux/s390/s390-32/libc.abilist |  2 +
 .../unix/sysv/linux/s390/s390-64/libc.abilist |  2 +
 sysdeps/unix/sysv/linux/sh/libc.abilist       |  2 +
 .../sysv/linux/sparc/sparc32/libc.abilist     |  2 +
 .../sysv/linux/sparc/sparc64/libc.abilist     |  2 +
 sysdeps/unix/sysv/linux/sys/rseq.h            | 50 +++++++++++
 sysdeps/unix/sysv/linux/x86/bits/rseq.h       | 30 +++++++
 .../unix/sysv/linux/x86_64/64/libc.abilist    |  2 +
 .../unix/sysv/linux/x86_64/x32/libc.abilist   |  2 +
 41 files changed, 513 insertions(+), 5 deletions(-)
 create mode 100644 misc/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-internal.h
 create mode 100644 sysdeps/unix/sysv/linux/rseq-sym.c
 create mode 100644 sysdeps/unix/sysv/linux/s390/bits/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/sys/rseq.h
 create mode 100644 sysdeps/unix/sysv/linux/x86/bits/rseq.h

-- 
2.17.1

Comments

Florian Weimer May 27, 2019, 11:19 a.m. | #1
* Mathieu Desnoyers:

> +/* volatile because fields can be read/updated by the kernel.  */

> +__thread volatile struct rseq __rseq_abi = {

> +  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,

> +};


As I've explained repeatedly, the volatile qualifier is wrong because it
is impossible to get rid of it.  (Accessing an object declared volatile
using non-volatile pointers is undefined.)  Code using __rseq_abi should
use relaxed MO atomics or signal fences/compiler barriers, as
appropriate.

> +/* Advertise Restartable Sequences registration ownership across

> +   application and shared libraries.

> +

> +   Libraries and applications must check whether this variable is zero or

> +   non-zero if they wish to perform rseq registration on their own. If it

> +   is zero, it means restartable sequence registration is not handled, and

> +   the library or application is free to perform rseq registration. In

> +   that case, the library or application is taking ownership of rseq

> +   registration, and may set __rseq_handled to 1. It may then set it back

> +   to 0 after it completes unregistering rseq.

> +

> +   If __rseq_handled is found to be non-zero, it means that another

> +   library (or the application) is currently handling rseq registration.

> +

> +   Typical use of __rseq_handled is within library constructors and

> +   destructors, or at program startup.  */

> +

> +int __rseq_handled;


It's not clear to me whether the intent is that __rseq_handled reflects
kernel support for rseq or not.  Currently, it only tells us whether
glibc has been built with rseq support or not.  It does not reflect
kernel support.  I'm still not convinced that this symbol is necessary,
especially if we mandate a kernel header version which defines __NR_rseq
for building glibc (which may happen due to the time64_t work).

Furthermore, the reference to ELF constructors is misleading.  I believe
the code you added to __libc_start_main to initialize __rseq_handled and
register __seq_abi with the kernel runs *after* ELF constructors have
executed (and not at all if the main program is written in Go, alas).
All initialization activity for the shared case needs to happen in
elf/rtld.c or called from there, probably as part of the security
initialization code or thereabouts.

Thanks,
Florian
Mathieu Desnoyers May 27, 2019, 7:27 p.m. | #2
----- On May 27, 2019, at 7:19 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> +/* volatile because fields can be read/updated by the kernel.  */

>> +__thread volatile struct rseq __rseq_abi = {

>> +  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,

>> +};

> 

> As I've explained repeatedly, the volatile qualifier is wrong because it

> is impossible to get rid of it.  (Accessing an object declared volatile

> using non-volatile pointers is undefined.)  Code using __rseq_abi should

> use relaxed MO atomics or signal fences/compiler barriers, as

> appropriate.


Hi Florian,

OK. So let's remove the volatile.

This means the sched_getcpu() implementation will need to load __rseq_abi.cpu_id
with a atomic_load_relaxed(), am I correct ?

This field can be updated at by the kernel at any point of user-space execution
due to preemption, so we need to ensure the load is performed as a single
instruction to prevent the compiler from doing load tearing, and to force it
to re-fetch the value within loops.

It would become:

int
sched_getcpu (void)
{
  int cpu_id = atomic_load_relaxed (&__rseq_abi.cpu_id);

  return cpu_id >= 0 ? cpu_id : vsyscall_sched_getcpu ();
}

> 

>> +/* Advertise Restartable Sequences registration ownership across

>> +   application and shared libraries.

>> +

>> +   Libraries and applications must check whether this variable is zero or

>> +   non-zero if they wish to perform rseq registration on their own. If it

>> +   is zero, it means restartable sequence registration is not handled, and

>> +   the library or application is free to perform rseq registration. In

>> +   that case, the library or application is taking ownership of rseq

>> +   registration, and may set __rseq_handled to 1. It may then set it back

>> +   to 0 after it completes unregistering rseq.

>> +

>> +   If __rseq_handled is found to be non-zero, it means that another

>> +   library (or the application) is currently handling rseq registration.

>> +

>> +   Typical use of __rseq_handled is within library constructors and

>> +   destructors, or at program startup.  */

>> +

>> +int __rseq_handled;

> 

> It's not clear to me whether the intent is that __rseq_handled reflects

> kernel support for rseq or not.


If __rseq_handled is set, it means a library is managing the rseq registration.
It is independent from the fact that the kernel supports rseq or not.

If e.g. glibc manages rseq registration, it sets __rseq_handled to 1. It will
then query the kernel for rseq availability. If the kernel happens to not
support rseq, the __rseq_abi.cpu_id will be set to RSEQ_CPU_ID_REGISTRATION_FAILED,
which means the registration has failed.

The kernel does not support rseq in that scenario, and it would be pointless
for an early adopter library to try to also register it.

As soon as a library changes the state of __rseq_abi.cpu_id, it is indeed
managing rseq registration. Perhaps the meaning of "handling" rseq registration
should be clarified in the comment.

> Currently, it only tells us whether

> glibc has been built with rseq support or not.  It does not reflect

> kernel support.


We know we have kernel support if __rseq_abi.cpu_id >= 0.

>  I'm still not convinced that this symbol is necessary,

> especially if we mandate a kernel header version which defines __NR_rseq

> for building glibc (which may happen due to the time64_t work).


__NR_rseq is not yet supported by all Linux architectures. So we will need
to support building glibc against kernel headers that do not define __NR_rseq
for quite a while anyway.

Moreover, this does not solve the issue tackled by __rseq_handled: early
adopter libraries managing rseq registration built against older glibc
versions which eventually end up running within a process linked against
a newer glibc which handles rseq registration.

> 

> Furthermore, the reference to ELF constructors is misleading.  I believe

> the code you added to __libc_start_main to initialize __rseq_handled and

> register __seq_abi with the kernel runs *after* ELF constructors have

> executed (and not at all if the main program is written in Go, alas).

> All initialization activity for the shared case needs to happen in

> elf/rtld.c or called from there, probably as part of the security

> initialization code or thereabouts.


in elf/rtld.c:dl_main() we have the following code:

  /* We do not initialize any of the TLS functionality unless any of the
     initial modules uses TLS.  This makes dynamic loading of modules with
     TLS impossible, but to support it requires either eagerly doing setup
     now or lazily doing it later.  Doing it now makes us incompatible with
     an old kernel that can't perform TLS_INIT_TP, even if no TLS is ever
     used.  Trying to do it lazily is too hairy to try when there could be
     multiple threads (from a non-TLS-using libpthread).  */
  bool was_tls_init_tp_called = tls_init_tp_called;
  if (tcbp == NULL)
    tcbp = init_tls ();

If I understand your point correctly, I should move the rseq_init() and
rseq_register_current_thread() for the SHARED case just after this
initialization, otherwise calling those from LIBC_START_MAIN() is too
late and it runs after initial modules constructors (or not at all for
Go). However, this means glibc will start using TLS internally. I'm
concerned that this is not quite in line with the above comment which
states that TLS is not initialized if no initial modules use TLS.

For the !SHARED use-case, if my understanding is correct, I should keep
rseq_init() and rseq_register_current_thread() calls within LIBC_START_MAIN().

Thoughts ?

Thanks for the feedback!

Mathieu



> 

> Thanks,

> Florian


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Mathieu Desnoyers May 29, 2019, 3:45 p.m. | #3
----- On May 27, 2019, at 3:27 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On May 27, 2019, at 7:19 AM, Florian Weimer fweimer@redhat.com wrote:

> 


[...]

>> 

>> Furthermore, the reference to ELF constructors is misleading.  I believe

>> the code you added to __libc_start_main to initialize __rseq_handled and

>> register __seq_abi with the kernel runs *after* ELF constructors have

>> executed (and not at all if the main program is written in Go, alas).

>> All initialization activity for the shared case needs to happen in

>> elf/rtld.c or called from there, probably as part of the security

>> initialization code or thereabouts.

> 

> in elf/rtld.c:dl_main() we have the following code:

> 

>  /* We do not initialize any of the TLS functionality unless any of the

>     initial modules uses TLS.  This makes dynamic loading of modules with

>     TLS impossible, but to support it requires either eagerly doing setup

>     now or lazily doing it later.  Doing it now makes us incompatible with

>     an old kernel that can't perform TLS_INIT_TP, even if no TLS is ever

>     used.  Trying to do it lazily is too hairy to try when there could be

>     multiple threads (from a non-TLS-using libpthread).  */

>  bool was_tls_init_tp_called = tls_init_tp_called;

>  if (tcbp == NULL)

>    tcbp = init_tls ();

> 

> If I understand your point correctly, I should move the rseq_init() and

> rseq_register_current_thread() for the SHARED case just after this

> initialization, otherwise calling those from LIBC_START_MAIN() is too

> late and it runs after initial modules constructors (or not at all for

> Go). However, this means glibc will start using TLS internally. I'm

> concerned that this is not quite in line with the above comment which

> states that TLS is not initialized if no initial modules use TLS.

> 

> For the !SHARED use-case, if my understanding is correct, I should keep

> rseq_init() and rseq_register_current_thread() calls within LIBC_START_MAIN().


I've moved the rseq initialization for SHARED case to the very end of
elf/rtld.c:init_tls(), and get the following error on make check:

Generating locale am_ET.UTF-8: this might take a while...
Inconsistency detected by ld.so: get-dynamic-info.h: 143: elf_get_dynamic_info: Assertion `info[DT_FLAGS] == NULL || (info[DT_FLAGS]->d_un.d_val & ~DF_BIND_NOW) == 0' failed!
Charmap: "UTF-8" Inputfile: "am_ET" Outputdir: "am_ET.UTF-8" failed
/bin/sh: 4: cannot create /home/efficios/git/glibc-build/localedata/am_ET.UTF-8/LC_CTYPE.test-result: Directory nonexistent

This error goes away if I comment out the call to rseq_register_current_thread (),
which touches the __rseq_abi __thread variable and issues a system call.

Currently, the __rseq_abi __thread variable is within
sysdeps/unix/sysv/linux/rseq-sym.c, which is added to the
sysdep_routines within sysdeps/unix/sysv/linux/Makefile. I
suspect it may need to be moved elsewhere.

Any thoughts on how to solve this ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Mathieu Desnoyers May 30, 2019, 8:56 p.m. | #4
----- On May 29, 2019, at 11:45 AM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On May 27, 2019, at 3:27 PM, Mathieu Desnoyers

> mathieu.desnoyers@efficios.com wrote:

> 

>> ----- On May 27, 2019, at 7:19 AM, Florian Weimer fweimer@redhat.com wrote:

>> 

> 

> [...]

> 

>>> 

>>> Furthermore, the reference to ELF constructors is misleading.  I believe

>>> the code you added to __libc_start_main to initialize __rseq_handled and

>>> register __seq_abi with the kernel runs *after* ELF constructors have

>>> executed (and not at all if the main program is written in Go, alas).

>>> All initialization activity for the shared case needs to happen in

>>> elf/rtld.c or called from there, probably as part of the security

>>> initialization code or thereabouts.

>> 

>> in elf/rtld.c:dl_main() we have the following code:

>> 

>>  /* We do not initialize any of the TLS functionality unless any of the

>>     initial modules uses TLS.  This makes dynamic loading of modules with

>>     TLS impossible, but to support it requires either eagerly doing setup

>>     now or lazily doing it later.  Doing it now makes us incompatible with

>>     an old kernel that can't perform TLS_INIT_TP, even if no TLS is ever

>>     used.  Trying to do it lazily is too hairy to try when there could be

>>     multiple threads (from a non-TLS-using libpthread).  */

>>  bool was_tls_init_tp_called = tls_init_tp_called;

>>  if (tcbp == NULL)

>>    tcbp = init_tls ();

>> 

>> If I understand your point correctly, I should move the rseq_init() and

>> rseq_register_current_thread() for the SHARED case just after this

>> initialization, otherwise calling those from LIBC_START_MAIN() is too

>> late and it runs after initial modules constructors (or not at all for

>> Go). However, this means glibc will start using TLS internally. I'm

>> concerned that this is not quite in line with the above comment which

>> states that TLS is not initialized if no initial modules use TLS.

>> 

>> For the !SHARED use-case, if my understanding is correct, I should keep

>> rseq_init() and rseq_register_current_thread() calls within LIBC_START_MAIN().

> 

> I've moved the rseq initialization for SHARED case to the very end of

> elf/rtld.c:init_tls(), and get the following error on make check:

> 

> Generating locale am_ET.UTF-8: this might take a while...

> Inconsistency detected by ld.so: get-dynamic-info.h: 143: elf_get_dynamic_info:

> Assertion `info[DT_FLAGS] == NULL || (info[DT_FLAGS]->d_un.d_val &

> ~DF_BIND_NOW) == 0' failed!

> Charmap: "UTF-8" Inputfile: "am_ET" Outputdir: "am_ET.UTF-8" failed

> /bin/sh: 4: cannot create

> /home/efficios/git/glibc-build/localedata/am_ET.UTF-8/LC_CTYPE.test-result:

> Directory nonexistent

> 

> This error goes away if I comment out the call to rseq_register_current_thread

> (),

> which touches the __rseq_abi __thread variable and issues a system call.

> 

> Currently, the __rseq_abi __thread variable is within

> sysdeps/unix/sysv/linux/rseq-sym.c, which is added to the

> sysdep_routines within sysdeps/unix/sysv/linux/Makefile. I

> suspect it may need to be moved elsewhere.

> 

> Any thoughts on how to solve this ?


I found that it's because touching a __thread variable from
ld-linux-x86-64.so.2 ends up setting the DF_STATIC_TLS flag
for that .so, which is really not expected.

Even if I tweak the assert to make it more lenient there,
touching the __thread variable ends up triggering a SIGFPE.

So rather than touching the TLS from ld-linux-x86-64.so.2,
I've rather experimented with moving the rseq initialization
for both SHARED and !SHARED cases to a library constructor
within libc.so.

Are you aware of any downside to this approach ?

diff --git a/csu/libc-start.c b/csu/libc-start.c
index 5d9c3675fa..9755ed5467 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -22,6 +22,7 @@
 #include <ldsodefs.h>
 #include <exit-thread.h>
 #include <libc-internal.h>
+#include <rseq-internal.h>
 
 #include <elf/dl-tunables.h>
 
@@ -81,6 +82,14 @@ apply_irel (void)
 }
 #endif
 
+static
+__attribute__ ((constructor))
+void __rseq_libc_init (void)
+{
+  rseq_init ();
+  /* Register rseq ABI to the kernel.   */
+  (void) rseq_register_current_thread ();
+}
 
 #ifdef LIBC_START_MAIN
 # ifdef LIBC_START_DISABLE_INLINE


Thanks,

Mathieu



-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer May 31, 2019, 8:06 a.m. | #5
* Mathieu Desnoyers:

> I found that it's because touching a __thread variable from

> ld-linux-x86-64.so.2 ends up setting the DF_STATIC_TLS flag

> for that .so, which is really not expected.

>

> Even if I tweak the assert to make it more lenient there,

> touching the __thread variable ends up triggering a SIGFPE.


Sorry, I got distracted at this critical juncture.  Yes, I forgot that
there isn't TLS support in the dynamic loader today.

> So rather than touching the TLS from ld-linux-x86-64.so.2,

> I've rather experimented with moving the rseq initialization

> for both SHARED and !SHARED cases to a library constructor

> within libc.so.

>

> Are you aware of any downside to this approach ?


The information whether the kernel supports rseq would not be available
to IFUNC resolvers.  And in some cases, ELF constructors for application
libraries could run before the libc.so.6 constructor, so applications
would see a transition from lack of kernel support to kernel support.

> +static

> +__attribute__ ((constructor))

> +void __rseq_libc_init (void)

> +{

> +  rseq_init ();

> +  /* Register rseq ABI to the kernel.   */

> +  (void) rseq_register_current_thread ();

> +}


I think the call to rseq_init (and the __rseq_handled variable) should
still be part of the dynamic loader.  Otherwise there could be confusion
about whether glibc handles the registration (due the constructor
ordering issue).

Thanks,
Florian
Mathieu Desnoyers May 31, 2019, 2:48 p.m. | #6
----- On May 31, 2019, at 4:06 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> I found that it's because touching a __thread variable from

>> ld-linux-x86-64.so.2 ends up setting the DF_STATIC_TLS flag

>> for that .so, which is really not expected.

>>

>> Even if I tweak the assert to make it more lenient there,

>> touching the __thread variable ends up triggering a SIGFPE.

> 

> Sorry, I got distracted at this critical juncture.  Yes, I forgot that

> there isn't TLS support in the dynamic loader today.

> 

>> So rather than touching the TLS from ld-linux-x86-64.so.2,

>> I've rather experimented with moving the rseq initialization

>> for both SHARED and !SHARED cases to a library constructor

>> within libc.so.

>>

>> Are you aware of any downside to this approach ?

> 

> The information whether the kernel supports rseq would not be available

> to IFUNC resolvers.  And in some cases, ELF constructors for application

> libraries could run before the libc.so.6 constructor, so applications

> would see a transition from lack of kernel support to kernel support.

> 

>> +static

>> +__attribute__ ((constructor))

>> +void __rseq_libc_init (void)

>> +{

>> +  rseq_init ();

>> +  /* Register rseq ABI to the kernel.   */

>> +  (void) rseq_register_current_thread ();

>> +}

> 

> I think the call to rseq_init (and the __rseq_handled variable) should

> still be part of the dynamic loader.  Otherwise there could be confusion

> about whether glibc handles the registration (due the constructor

> ordering issue).


Let's break this down into the various sub-issues involved:

1) How early do we need to setup rseq ? Should it be setup before:
   - LD_PRELOAD .so constructors ?
     - Without circular dependency,
     - With circular dependency,
   - audit libraries initialization ?
   - IFUNC resolvers ?
   - other callbacks ?
   - memory allocator calls ?

We may end up in a situation where we need memory allocation to be setup
in order to initialize TLS before rseq can be registered for the main
thread. I suspect we will end up needing a fallbacks which always work
for the few cases that would try to use rseq too early in dl/libc startup.

2) Do we need to setup __rseq_handled and __rseq_abi at the same stage of
   startup, or is it OK to setup __rseq_handled before __rseq_abi ?

3) Which shared object owns __rseq_handled and __rseq_abi ?
   - libc.so ?
   - ld-linux-*.so.2 ?
   - Should both symbols be owned by the same .so ?
   - What about the !SHARED case ? I think this would end up in libc.a in all cases.

4) Inability to touch a TLS variable (__rseq_abi) from ld-linux-*.so.2
   - Should we extend the dynamic linker to allow such TLS variable to be
     accessed ? If so, how much effort is required ?
   - Can we find an alternative way to initialize rseq early during
     dl init stages while still performing the TLS access from a function
     implemented within libc.so ?

So far, I got rseq to be initialized before LD_PRELOADed library constructors
by doing the initialization in a constructor within libc.so. I don't particularly
like this approach, because the constructor order is not guaranteed.

One possible solution would be to somehow expose a rseq initialization function
symbol from libc.so, look it up from ld-linux-*.so.2, and invoke it after libc.so
has been loaded. It would end up being similar to a constructor, but with a
fixed invocation order.

I'm just not sure we have everything we need to do this in ld-linux-*.so.2
init stages.

Thoughts ?

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer May 31, 2019, 3:46 p.m. | #7
* Mathieu Desnoyers:

> Let's break this down into the various sub-issues involved:

>

> 1) How early do we need to setup rseq ? Should it be setup before:

>    - LD_PRELOAD .so constructors ?

>      - Without circular dependency,

>      - With circular dependency,

>    - audit libraries initialization ?

>    - IFUNC resolvers ?

>    - other callbacks ?

>    - memory allocator calls ?

>

> We may end up in a situation where we need memory allocation to be setup

> in order to initialize TLS before rseq can be registered for the main

> thread. I suspect we will end up needing a fallbacks which always work

> for the few cases that would try to use rseq too early in dl/libc startup.


I think the answer to that depends on whether it's okay to have an
observable transition from “no rseq kernel support” to “kernel supports
rseq”.

> 2) Do we need to setup __rseq_handled and __rseq_abi at the same stage of

>    startup, or is it OK to setup __rseq_handled before __rseq_abi ?


I think we should be able to set __rseq_handle early, even if we can
perform the rseq area registration later.  (The distinction does not
matter if the registration needs to be performed early as well.)

Setting __rseq_handle in ld.so is easy if the variable is defined in
ld.so, which is not a problem at all.

> 3) Which shared object owns __rseq_handled and __rseq_abi ?

>    - libc.so ?

>    - ld-linux-*.so.2 ?

>    - Should both symbols be owned by the same .so ?


I think we can pick whatever works, based on the requirements from (1).
It's an implementation detail (altough it currently becomes part of the
ABI for weird reasons, but the choice itself is arbitrary).

>    - What about the !SHARED case ? I think this would end up in libc.a

>    in all cases.


Correct.

> 4) Inability to touch a TLS variable (__rseq_abi) from ld-linux-*.so.2

>    - Should we extend the dynamic linker to allow such TLS variable to be

>      accessed ? If so, how much effort is required ?

>    - Can we find an alternative way to initialize rseq early during

>      dl init stages while still performing the TLS access from a function

>      implemented within libc.so ?


This is again related to the answer for (1).  There are various hacks we
could implement to make the initialization invisible (e.g., computing
the address of the variable using the equivalent of dlsym, after loading
all the initial objects and before starting relocation).  If it's not
too hard to add TLS support to ld.so, we can consider that as well.
(The allocation side should be pretty easy, relocation support it could
be more tricky.)

> So far, I got rseq to be initialized before LD_PRELOADed library

> constructors by doing the initialization in a constructor within

> libc.so. I don't particularly like this approach, because the

> constructor order is not guaranteed.


Right.

> One possible solution would be to somehow expose a rseq initialization

> function symbol from libc.so, look it up from ld-linux-*.so.2, and

> invoke it after libc.so has been loaded. It would end up being similar

> to a constructor, but with a fixed invocation order.


This would still expose lack of rseq support to IFUNC resolvers
initially.  I don't know if this is a problem (again, it comes down to
(1) above).  There is a school of thought that you can't reference
__rseq_abi from an IFUNC resolver because it needs a relocation.

Thanks,
Florian
Mathieu Desnoyers May 31, 2019, 6:10 p.m. | #8
----- On May 31, 2019, at 11:46 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> Let's break this down into the various sub-issues involved:

>>

>> 1) How early do we need to setup rseq ? Should it be setup before:

>>    - LD_PRELOAD .so constructors ?

>>      - Without circular dependency,

>>      - With circular dependency,

>>    - audit libraries initialization ?

>>    - IFUNC resolvers ?

>>    - other callbacks ?

>>    - memory allocator calls ?

>>

>> We may end up in a situation where we need memory allocation to be setup

>> in order to initialize TLS before rseq can be registered for the main

>> thread. I suspect we will end up needing a fallbacks which always work

>> for the few cases that would try to use rseq too early in dl/libc startup.

> 

> I think the answer to that depends on whether it's okay to have an

> observable transition from “no rseq kernel support” to “kernel supports

> rseq”.


As far as my own use-cases are concerned, I only care that rseq is initialized
before LD_PRELOAD .so constructors are executed.

There appears to be some amount of documented limitations for what can be
done by the IFUNC resolvers. It might be acceptable to document that rseq
might not be initialized yet when those are executed.

I'd like to hear what others think about whether we should care about IFUNC
resolvers and audit libraries using restartable sequences TLS ?

[...]

> 

>> 4) Inability to touch a TLS variable (__rseq_abi) from ld-linux-*.so.2

>>    - Should we extend the dynamic linker to allow such TLS variable to be

>>      accessed ? If so, how much effort is required ?

>>    - Can we find an alternative way to initialize rseq early during

>>      dl init stages while still performing the TLS access from a function

>>      implemented within libc.so ?

> 

> This is again related to the answer for (1).  There are various hacks we

> could implement to make the initialization invisible (e.g., computing

> the address of the variable using the equivalent of dlsym, after loading

> all the initial objects and before starting relocation).  If it's not

> too hard to add TLS support to ld.so, we can consider that as well.

> (The allocation side should be pretty easy, relocation support it could

> be more tricky.)

> 

>> So far, I got rseq to be initialized before LD_PRELOADed library

>> constructors by doing the initialization in a constructor within

>> libc.so. I don't particularly like this approach, because the

>> constructor order is not guaranteed.

> 

> Right.


One question related to use of constructors: AFAIU, if a library depends
on glibc, ELF guarantees that the glibc constructor will be executed first,
before the other library.

Which leaves us with the execution order of constructors within libc.so,
which is not guaranteed if we just use __attribute__ ((constructor)).
However, all gcc versions that are required to build recent glibc
seem to support a constructor with a "priority" value (lower gets
executed first, and those are executed before constructors without
priority).

Could we do e.g.:

--- a/include/libc-internal.h
+++ b/include/libc-internal.h
@@ -21,6 +21,12 @@
 
 #include <hp-timing.h>
 
+/* Libc constructor priority order. Lower is executed first.  */
+enum libc_constructor_prio {
+       /* Priorities between 0 and 100 are reserved.  */
+       LIBC_CONSTRUCTOR_PRIO_RSEQ_INIT = 1000,
+};
+
 /* Initialize the `__libc_enable_secure' flag.  */
 extern void __libc_init_secure (void);
 
and

csu/libc-start.c:

static
__attribute__ ((constructor (LIBC_CONSTRUCTOR_PRIO_RSEQ_INIT)))
void __rseq_libc_init (void)
{
  rseq_init ();
  /* Register rseq ABI to the kernel.   */
  (void) rseq_register_current_thread ();
}

[...]

Thanks,

Mathieu




-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 4, 2019, 11:46 a.m. | #9
* Mathieu Desnoyers:

> ----- On May 31, 2019, at 11:46 AM, Florian Weimer fweimer@redhat.com wrote:

>

>> * Mathieu Desnoyers:

>> 

>>> Let's break this down into the various sub-issues involved:

>>>

>>> 1) How early do we need to setup rseq ? Should it be setup before:

>>>    - LD_PRELOAD .so constructors ?

>>>      - Without circular dependency,

>>>      - With circular dependency,

>>>    - audit libraries initialization ?

>>>    - IFUNC resolvers ?

>>>    - other callbacks ?

>>>    - memory allocator calls ?

>>>

>>> We may end up in a situation where we need memory allocation to be setup

>>> in order to initialize TLS before rseq can be registered for the main

>>> thread. I suspect we will end up needing a fallbacks which always work

>>> for the few cases that would try to use rseq too early in dl/libc startup.

>> 

>> I think the answer to that depends on whether it's okay to have an

>> observable transition from “no rseq kernel support” to “kernel supports

>> rseq”.

>

> As far as my own use-cases are concerned, I only care that rseq is initialized

> before LD_PRELOAD .so constructors are executed.


<https://sourceware.org/bugzilla/show_bug.cgi?id=14379> is relevant in
this context.  It requests the opposite behavior from LD_PRELOAD.

> There appears to be some amount of documented limitations for what can be

> done by the IFUNC resolvers. It might be acceptable to document that rseq

> might not be initialized yet when those are executed.


The only obstacle is that there are so many places where we could put
this information.

> I'd like to hear what others think about whether we should care about IFUNC

> resolvers and audit libraries using restartable sequences TLS ?


In audit libraries (and after dlmopen), the inner libc will have
duplicated TLS values, so it will look as if the TLS area is not active
(but a registration has happened with the kernel).  If we move
__rseq_handled into the dynamic linker, its value will be shared along
with ld.so with the inner objects.  However, the inner libc still has to
ensure that its registration attempt does not succeed because that would
activate the wrong rseq area.

The final remaining case is static dlopen.  There is a copy of ld.so on
the dynamic side, but it is completely inactive and has never run.  I do
not think we need to support that because multi-threading does not work
reliably in this scenario, either.  However, we should skip rseq
registration in a nested libc (see the rtld_active function).

>>> 4) Inability to touch a TLS variable (__rseq_abi) from ld-linux-*.so.2

>>>    - Should we extend the dynamic linker to allow such TLS variable to be

>>>      accessed ? If so, how much effort is required ?

>>>    - Can we find an alternative way to initialize rseq early during

>>>      dl init stages while still performing the TLS access from a function

>>>      implemented within libc.so ?

>> 

>> This is again related to the answer for (1).  There are various hacks we

>> could implement to make the initialization invisible (e.g., computing

>> the address of the variable using the equivalent of dlsym, after loading

>> all the initial objects and before starting relocation).  If it's not

>> too hard to add TLS support to ld.so, we can consider that as well.

>> (The allocation side should be pretty easy, relocation support it could

>> be more tricky.)

>> 

>>> So far, I got rseq to be initialized before LD_PRELOADed library

>>> constructors by doing the initialization in a constructor within

>>> libc.so. I don't particularly like this approach, because the

>>> constructor order is not guaranteed.

>> 

>> Right.

>

> One question related to use of constructors: AFAIU, if a library depends

> on glibc, ELF guarantees that the glibc constructor will be executed first,

> before the other library.


There are some exceptions, like DT_PREINIT_ARRAY functions and
DF_1_INITFIRST.  Some of these mechanisms we use in the implementation
itself, so they are not really usable to end users.  Cycles should not
come into play here.

By default, an object that uses the rseq area will have to link against
libc (perhaps indirectly), and therefore the libc constructor runs
first.

> Which leaves us with the execution order of constructors within libc.so,

> which is not guaranteed if we just use __attribute__ ((constructor)).

> However, all gcc versions that are required to build recent glibc

> seem to support a constructor with a "priority" value (lower gets

> executed first, and those are executed before constructors without

> priority).


I'm not sure that's the right way to do it.  If we want to happen
execution in a specific order, we should write a single constructor
function which is called from _init.  For the time being, we can add the
call to an appropriately defined inline function early in _init in
elf/init-first.c (which is shared with Hurd, so Hurd will need some sort
of stub function).

Thanks,
Florian
Mathieu Desnoyers June 4, 2019, 3:57 p.m. | #10
----- On Jun 4, 2019, at 7:46 AM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> ----- On May 31, 2019, at 11:46 AM, Florian Weimer fweimer@redhat.com wrote:

>>

>>> * Mathieu Desnoyers:

>>> 

>>>> Let's break this down into the various sub-issues involved:

>>>>

>>>> 1) How early do we need to setup rseq ? Should it be setup before:

>>>>    - LD_PRELOAD .so constructors ?

>>>>      - Without circular dependency,

>>>>      - With circular dependency,

>>>>    - audit libraries initialization ?

>>>>    - IFUNC resolvers ?

>>>>    - other callbacks ?

>>>>    - memory allocator calls ?

>>>>

>>>> We may end up in a situation where we need memory allocation to be setup

>>>> in order to initialize TLS before rseq can be registered for the main

>>>> thread. I suspect we will end up needing a fallbacks which always work

>>>> for the few cases that would try to use rseq too early in dl/libc startup.

>>> 

>>> I think the answer to that depends on whether it's okay to have an

>>> observable transition from “no rseq kernel support” to “kernel supports

>>> rseq”.

>>

>> As far as my own use-cases are concerned, I only care that rseq is initialized

>> before LD_PRELOAD .so constructors are executed.

> 

> <https://sourceware.org/bugzilla/show_bug.cgi?id=14379> is relevant in

> this context.  It requests the opposite behavior from LD_PRELOAD.


This link is very interesting. It sheds some light into how a LD_PRELOAD user
wants to override malloc.

Should we plan ahead for such scheme to override which library "owns" rseq
registration from a LD_PRELOAD library ? If so, then we would want glibc to
set __rseq_handled _after_ LD_PRELOAD ctors are executed.

However, this brings the following situation: lttng-ust can be LD_PRELOADed
into applications, and I intend to make it provide rseq registration *only if*
the glibc does not provide it.

As a brainstorm idea, one way around this would be to turn __rseq_handled into
a 4-states variable:

RSEQ_REG_UNSET = 0, -> no library handles rseq
RSEQ_REG_PREINIT = 1, -> libc supports RSEQ, initialization not done yet,
RSEQ_REG_LIBC = 2, -> libc supports RSEQ, owns registration,
RSEQ_REG_OVERRIDE = 3, -> LD_PRELOAD library owns registration.

So a lttng-ust LD_PRELOAD could manage rseq registration by setting
__rseq_handled = RSEQ_REG_OVERRIDE only after observing the state
RSEQ_REG_UNSET.

A LD_PRELOAD library wishing to override the libc rseq management should set
__rseq_handled to RSEQ_REG_OVERRIDE after observing either UNSET or PREINIT.

> 

>> There appears to be some amount of documented limitations for what can be

>> done by the IFUNC resolvers. It might be acceptable to document that rseq

>> might not be initialized yet when those are executed.

> 

> The only obstacle is that there are so many places where we could put

> this information.


If we postpone the actual rseq registration by glibc after LD_PRELOAD ctors
execution, I think it makes it clear that we have a part of the startup
which executes without rseq being registered:

(please let me know if I'm getting some things wrong in the following sequences)

A) Startup sequence (glibc owns rseq):

                                  __rseq_handled          __rseq_abi (TLS)
                                  --------------          ----------------------
                                  RSEQ_REG_UNSET          no TLS available
                                  RSEQ_REG_PREINIT
IFUNC resolvers,
audit libraries...
                                                          TLS becomes available
LD_PRELOAD ctors
glibc initialization              RSEQ_REG_LIBC
                                                          registered to kernel by sys_rseq.


B) Startup sequence (LD_PRELOAD lttng-ust owns rseq, old glibc):

                                  __rseq_handled          __rseq_abi (TLS)
                                  --------------          ----------------------
                                  RSEQ_REG_UNSET          no TLS available
IFUNC resolvers,
audit libraries...
                                                          TLS becomes available
LD_PRELOAD ctors                  RSEQ_REG_OVERRIDE
                                                          registered to kernel by sys_rseq.


C) Startup sequence (LD_PRELOAD rseq override library owning rseq):

                                  __rseq_handled          __rseq_abi (TLS)
                                  --------------          ----------------------
                                  RSEQ_REG_UNSET          no TLS available
                                  RSEQ_REG_PREINIT
IFUNC resolvers,
audit libraries...
                                                          TLS becomes available
LD_PRELOAD ctors                  RSEQ_REG_OVERRIDE
                                                          registered to kernel by sys_rseq.
glibc initialization

> 

>> I'd like to hear what others think about whether we should care about IFUNC

>> resolvers and audit libraries using restartable sequences TLS ?

> 

> In audit libraries (and after dlmopen), the inner libc will have

> duplicated TLS values, so it will look as if the TLS area is not active

> (but a registration has happened with the kernel).  If we move

> __rseq_handled into the dynamic linker, its value will be shared along

> with ld.so with the inner objects.  However, the inner libc still has to

> ensure that its registration attempt does not succeed because that would

> activate the wrong rseq area.


Having an intermediate RSEQ_REG_PREINIT state covering the entire
duration where the inner libc is in use should do the trick to ensure
the duplicated TLS area is not used at that point.

The covered use-cases would be to override rseq registration ownership
from LD_PRELOADed libraries, but disallow it from IFUNC resolvers and
audit libraries.

As a consequence of this, rseq critical sections should be prepared
to use a fall-back mechanism (e.g. the cpu_opv system call I have been
trying to upstream) when they notice rseq is not yet initialized
for a rseq c.s. executed within a preinit stage, or very early/late
in a thread's lifetime. This is a requirement I have seen coming for
a while now. Testing for non-registered rseq is very straightforward
and fast to do on a fast-path through the __rseq_abi.cpu_id field:
it has a negative value if rseq is not registered for the current
thread.

> 

> The final remaining case is static dlopen.  There is a copy of ld.so on

> the dynamic side, but it is completely inactive and has never run.  I do

> not think we need to support that because multi-threading does not work

> reliably in this scenario, either.  However, we should skip rseq

> registration in a nested libc (see the rtld_active function).


So for SHARED, if (!rtld_active ()), we should indeed leave the state of
__rseq_handled as it is, because we are within a nested inactive ld.so.

> 

>>>> 4) Inability to touch a TLS variable (__rseq_abi) from ld-linux-*.so.2

>>>>    - Should we extend the dynamic linker to allow such TLS variable to be

>>>>      accessed ? If so, how much effort is required ?

>>>>    - Can we find an alternative way to initialize rseq early during

>>>>      dl init stages while still performing the TLS access from a function

>>>>      implemented within libc.so ?

>>> 

>>> This is again related to the answer for (1).  There are various hacks we

>>> could implement to make the initialization invisible (e.g., computing

>>> the address of the variable using the equivalent of dlsym, after loading

>>> all the initial objects and before starting relocation).  If it's not

>>> too hard to add TLS support to ld.so, we can consider that as well.

>>> (The allocation side should be pretty easy, relocation support it could

>>> be more tricky.)

>>> 

>>>> So far, I got rseq to be initialized before LD_PRELOADed library

>>>> constructors by doing the initialization in a constructor within

>>>> libc.so. I don't particularly like this approach, because the

>>>> constructor order is not guaranteed.

>>> 

>>> Right.

>>

>> One question related to use of constructors: AFAIU, if a library depends

>> on glibc, ELF guarantees that the glibc constructor will be executed first,

>> before the other library.

> 

> There are some exceptions, like DT_PREINIT_ARRAY functions and

> DF_1_INITFIRST.  Some of these mechanisms we use in the implementation

> itself, so they are not really usable to end users.  Cycles should not

> come into play here.

> 

> By default, an object that uses the rseq area will have to link against

> libc (perhaps indirectly), and therefore the libc constructor runs

> first.


If we agree on postponing the actual TLS registration _after_ LD_PRELOAD
ctors are executed, the problem becomes easier. We then only need to
move __rseq_handled to ld.so, and set it to a PREINIT state until we
eventually perform the TLS registration (after LD_PRELOAD ctors).

> 

>> Which leaves us with the execution order of constructors within libc.so,

>> which is not guaranteed if we just use __attribute__ ((constructor)).

>> However, all gcc versions that are required to build recent glibc

>> seem to support a constructor with a "priority" value (lower gets

>> executed first, and those are executed before constructors without

>> priority).

> 

> I'm not sure that's the right way to do it.  If we want to happen

> execution in a specific order, we should write a single constructor

> function which is called from _init.  For the time being, we can add the

> call to an appropriately defined inline function early in _init in

> elf/init-first.c (which is shared with Hurd, so Hurd will need some sort

> of stub function).


In my attempts, there were some cases where _init was not invoked before
LD_PRELOAD ctors, but I cannot remember which at this point. Anyhow, if
we choose to postpone the actual TLS registration after LD_PRELOAD ctors,
this becomes a non-issue.

We might want to rename the __rseq_handled symbol to a better name if
it becomes a 4-states variable, e.g. __rseq_reg_owner.

Thoughts ?

Thanks,

Mathieu


> 

> Thanks,

> Florian


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 6, 2019, 11:57 a.m. | #11
* Mathieu Desnoyers:

> Should we plan ahead for such scheme to override which library "owns" rseq

> registration from a LD_PRELOAD library ? If so, then we would want glibc to

> set __rseq_handled _after_ LD_PRELOAD ctors are executed.


I don't think so.  The LD_PRELOAD phase is not clearly delineated from
the non-preload phase.  So it's not clear to me what this would even
mean in practice.

Let me ask the key question again: Does it matter if code observes the
rseq area first without kernel support, and then with kernel support?
If we don't expect any problems immediately, we do not need to worry
much about the constructor ordering right now.  I expect that over time,
fixing this properly will become easier.

>> The final remaining case is static dlopen.  There is a copy of ld.so on

>> the dynamic side, but it is completely inactive and has never run.  I do

>> not think we need to support that because multi-threading does not work

>> reliably in this scenario, either.  However, we should skip rseq

>> registration in a nested libc (see the rtld_active function).

>

> So for SHARED, if (!rtld_active ()), we should indeed leave the state of

> __rseq_handled as it is, because we are within a nested inactive ld.so.


I think we should add __rseq_handled initialization to ld.so, so it will
only run once, ever.

It's the registration from libc.so which needs some care.  In
particular, we must not override an existing registration.

Thanks,
Florian
Carlos O'Donell June 10, 2019, 2:43 p.m. | #12
On 6/6/19 7:57 AM, Florian Weimer wrote:
> Let me ask the key question again: Does it matter if code observes the

> rseq area first without kernel support, and then with kernel support?

> If we don't expect any problems immediately, we do not need to worry

> much about the constructor ordering right now.  I expect that over time,

> fixing this properly will become easier.


I just wanted to chime in and say that splitting this into:

* Ownership (__rseq_handled)

* Initialization (__rseq_abi)

Makes sense to me.

I agree we need an answer to this question of ownership but not yet
initialized, to owned and initialized.

I like the idea of having __rseq_handled in ld.so.

-- 
Cheers,
Carlos.
Mathieu Desnoyers June 12, 2019, 2 p.m. | #13
----- On Jun 10, 2019, at 4:43 PM, carlos carlos@redhat.com wrote:

> On 6/6/19 7:57 AM, Florian Weimer wrote:

>> Let me ask the key question again: Does it matter if code observes the

>> rseq area first without kernel support, and then with kernel support?

>> If we don't expect any problems immediately, we do not need to worry

>> much about the constructor ordering right now.  I expect that over time,

>> fixing this properly will become easier.

> 

> I just wanted to chime in and say that splitting this into:

> 

> * Ownership (__rseq_handled)

> 

> * Initialization (__rseq_abi)

> 

> Makes sense to me.

> 

> I agree we need an answer to this question of ownership but not yet

> initialized, to owned and initialized.

> 

> I like the idea of having __rseq_handled in ld.so.


Very good, so I'll implement this approach. Sorry for the delayed
feedback, I am traveling this week.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Mathieu Desnoyers June 12, 2019, 2:16 p.m. | #14
----- On Jun 6, 2019, at 1:57 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

[...]
> 

>>> The final remaining case is static dlopen.  There is a copy of ld.so on

>>> the dynamic side, but it is completely inactive and has never run.  I do

>>> not think we need to support that because multi-threading does not work

>>> reliably in this scenario, either.  However, we should skip rseq

>>> registration in a nested libc (see the rtld_active function).

>>

>> So for SHARED, if (!rtld_active ()), we should indeed leave the state of

>> __rseq_handled as it is, because we are within a nested inactive ld.so.

> 

> I think we should add __rseq_handled initialization to ld.so, so it will

> only run once, ever.


OK

> 

> It's the registration from libc.so which needs some care.  In

> particular, we must not override an existing registration.


OK, so it could check if __rseq_abi.cpu_id is -1, and only
perform registration if it is the case. Or do you have another
approach in mind ?

For the main thread, "nested" unregistration does not appear to be a
problem, because we rely on program exit() to implicitly unregister.

Thanks,

Mathieu

> 

> Thanks,

> Florian


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 12, 2019, 2:22 p.m. | #15
* Mathieu Desnoyers:

>> It's the registration from libc.so which needs some care.  In

>> particular, we must not override an existing registration.

>

> OK, so it could check if __rseq_abi.cpu_id is -1, and only

> perform registration if it is the case. Or do you have another

> approach in mind ?


No, __rseq_abi will not be shared with the outer libc, so the inner libc
will always see -1 there, even if the outer libc has performed
registration.

libio/vtables.c has some example what you can do:

  /* In case this libc copy is in a non-default namespace, we always
     need to accept foreign vtables because there is always a
     possibility that FILE * objects are passed across the linking
     boundary.  */
  {
    Dl_info di;
    struct link_map *l;
    if (!rtld_active ()
        || (_dl_addr (_IO_vtable_check, &di, &l, NULL) != 0
            && l->l_ns != LM_ID_BASE))
      return;
  }

_IO_vtable_check would have to be replaced with your own function; the
actual function doesn't really matter.

The rtld_active check covers the static dlopen case, where
rtld_active () is false in the inner libc.

Thanks,
Florian
Mathieu Desnoyers June 12, 2019, 2:36 p.m. | #16
----- On Jun 12, 2019, at 4:22 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>>> It's the registration from libc.so which needs some care.  In

>>> particular, we must not override an existing registration.

>>

>> OK, so it could check if __rseq_abi.cpu_id is -1, and only

>> perform registration if it is the case. Or do you have another

>> approach in mind ?

> 

> No, __rseq_abi will not be shared with the outer libc, so the inner libc

> will always see -1 there, even if the outer libc has performed

> registration.

> 

> libio/vtables.c has some example what you can do:

> 

>  /* In case this libc copy is in a non-default namespace, we always

>     need to accept foreign vtables because there is always a

>     possibility that FILE * objects are passed across the linking

>     boundary.  */

>  {

>    Dl_info di;

>    struct link_map *l;

>    if (!rtld_active ()

>        || (_dl_addr (_IO_vtable_check, &di, &l, NULL) != 0

>            && l->l_ns != LM_ID_BASE))

>      return;

>  }

> 

> _IO_vtable_check would have to be replaced with your own function; the

> actual function doesn't really matter.

> 

> The rtld_active check covers the static dlopen case, where

> rtld_active () is false in the inner libc.


Then out of curiosity, would it also work if I check for

if (!__libc_multiple_libcs)

in LIBC_START_MAIN ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 12, 2019, 2:43 p.m. | #17
* Mathieu Desnoyers:

> ----- On Jun 12, 2019, at 4:22 PM, Florian Weimer fweimer@redhat.com wrote:

>

>> * Mathieu Desnoyers:

>> 

>>>> It's the registration from libc.so which needs some care.  In

>>>> particular, we must not override an existing registration.

>>>

>>> OK, so it could check if __rseq_abi.cpu_id is -1, and only

>>> perform registration if it is the case. Or do you have another

>>> approach in mind ?

>> 

>> No, __rseq_abi will not be shared with the outer libc, so the inner libc

>> will always see -1 there, even if the outer libc has performed

>> registration.

>> 

>> libio/vtables.c has some example what you can do:

>> 

>>  /* In case this libc copy is in a non-default namespace, we always

>>     need to accept foreign vtables because there is always a

>>     possibility that FILE * objects are passed across the linking

>>     boundary.  */

>>  {

>>    Dl_info di;

>>    struct link_map *l;

>>    if (!rtld_active ()

>>        || (_dl_addr (_IO_vtable_check, &di, &l, NULL) != 0

>>            && l->l_ns != LM_ID_BASE))

>>      return;

>>  }

>> 

>> _IO_vtable_check would have to be replaced with your own function; the

>> actual function doesn't really matter.

>> 

>> The rtld_active check covers the static dlopen case, where

>> rtld_active () is false in the inner libc.

>

> Then out of curiosity, would it also work if I check for

>

> if (!__libc_multiple_libcs)

>

> in LIBC_START_MAIN ?


In my experience, __libc_multiple_libcs is not reliable.  I have not yet
figured out why.

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 10:03 a.m. | #18
----- On Jun 12, 2019, at 4:00 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jun 10, 2019, at 4:43 PM, carlos carlos@redhat.com wrote:

> 

>> On 6/6/19 7:57 AM, Florian Weimer wrote:

>>> Let me ask the key question again: Does it matter if code observes the

>>> rseq area first without kernel support, and then with kernel support?

>>> If we don't expect any problems immediately, we do not need to worry

>>> much about the constructor ordering right now.  I expect that over time,

>>> fixing this properly will become easier.

>> 

>> I just wanted to chime in and say that splitting this into:

>> 

>> * Ownership (__rseq_handled)

>> 

>> * Initialization (__rseq_abi)

>> 

>> Makes sense to me.

>> 

>> I agree we need an answer to this question of ownership but not yet

>> initialized, to owned and initialized.

>> 

>> I like the idea of having __rseq_handled in ld.so.

> 

> Very good, so I'll implement this approach. Sorry for the delayed

> feedback, I am traveling this week.


I had issues with cases where application or LD_PRELOAD library also
define the __rseq_handled symbol. They appear not to see the same
address as the one initialized by ld.so.

I tried using the GL() macro in ld.so to set __rseq_handled, but it's
the wrong address compared to what the preload lib and application observe.

Any thoughts on how to solve this ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 14, 2019, 10:06 a.m. | #19
* Mathieu Desnoyers:

> ----- On Jun 12, 2019, at 4:00 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

>

>> ----- On Jun 10, 2019, at 4:43 PM, carlos carlos@redhat.com wrote:

>> 

>>> On 6/6/19 7:57 AM, Florian Weimer wrote:

>>>> Let me ask the key question again: Does it matter if code observes the

>>>> rseq area first without kernel support, and then with kernel support?

>>>> If we don't expect any problems immediately, we do not need to worry

>>>> much about the constructor ordering right now.  I expect that over time,

>>>> fixing this properly will become easier.

>>> 

>>> I just wanted to chime in and say that splitting this into:

>>> 

>>> * Ownership (__rseq_handled)

>>> 

>>> * Initialization (__rseq_abi)

>>> 

>>> Makes sense to me.

>>> 

>>> I agree we need an answer to this question of ownership but not yet

>>> initialized, to owned and initialized.

>>> 

>>> I like the idea of having __rseq_handled in ld.so.

>> 

>> Very good, so I'll implement this approach. Sorry for the delayed

>> feedback, I am traveling this week.

>

> I had issues with cases where application or LD_PRELOAD library also

> define the __rseq_handled symbol. They appear not to see the same

> address as the one initialized by ld.so.


What exactly did you do?  How did you determine the addresses?  How is
__rseq_handled defined in ld.so?

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 10:14 a.m. | #20
----- On Jun 14, 2019, at 12:06 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> ----- On Jun 12, 2019, at 4:00 PM, Mathieu Desnoyers

>> mathieu.desnoyers@efficios.com wrote:

>>

>>> ----- On Jun 10, 2019, at 4:43 PM, carlos carlos@redhat.com wrote:

>>> 

>>>> On 6/6/19 7:57 AM, Florian Weimer wrote:

>>>>> Let me ask the key question again: Does it matter if code observes the

>>>>> rseq area first without kernel support, and then with kernel support?

>>>>> If we don't expect any problems immediately, we do not need to worry

>>>>> much about the constructor ordering right now.  I expect that over time,

>>>>> fixing this properly will become easier.

>>>> 

>>>> I just wanted to chime in and say that splitting this into:

>>>> 

>>>> * Ownership (__rseq_handled)

>>>> 

>>>> * Initialization (__rseq_abi)

>>>> 

>>>> Makes sense to me.

>>>> 

>>>> I agree we need an answer to this question of ownership but not yet

>>>> initialized, to owned and initialized.

>>>> 

>>>> I like the idea of having __rseq_handled in ld.so.

>>> 

>>> Very good, so I'll implement this approach. Sorry for the delayed

>>> feedback, I am traveling this week.

>>

>> I had issues with cases where application or LD_PRELOAD library also

>> define the __rseq_handled symbol. They appear not to see the same

>> address as the one initialized by ld.so.

> 

> What exactly did you do?  How did you determine the addresses?  How is

> __rseq_handled defined in ld.so?


The easiest way to answer these questions is through links to my github
dev branch:

https://github.com/compudj/glibc-dev/tree/glibc-rseq

specifically this commit:
https://github.com/compudj/glibc-dev/commit/c49a286497d065a7fc00aafd846e6edce14f97fc
and this attempt at using GL():
https://github.com/compudj/glibc-dev/commit/8a02acfbb6943672bfa36b4fc6f61905ee4fa180

My test programs are:

* a.c:

#include <stdio.h>
#include <linux/rseq.h>

extern __thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec")));/* = {
	.cpu_id = -1,
};*/
extern int __rseq_handled;

int main()
{
	fprintf(stderr, "__rseq_handled main: %d %p\n", __rseq_handled, &__rseq_handled);
	fprintf(stderr, "__rseq_abi.cpu_id main: %d %p\n", __rseq_abi.cpu_id, &__rseq_abi);
	return 0;
}

* s.c:

#include <stdio.h>
#include <linux/rseq.h>

#if 0
__thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec"))) = {
	.cpu_id = -1,
};
int __rseq_handled;

#else
extern __thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec")));
extern int __rseq_handled;
#endif

void __attribute__((constructor)) myinit(void)
{
	fprintf(stderr, "__rseq_handled s.so: %d %p\n", __rseq_handled, &__rseq_handled);
	fprintf(stderr, "__rseq_abi.cpu_id s.so: %d %p\n", __rseq_abi.cpu_id, &__rseq_abi);
}

* Makefile:

LIBCPATH=/home/efficios/glibc-test/lib
KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include
CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH} -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2

all:
	gcc ${CFLAGS} -o a a.c
	gcc ${CFLAGS} -shared -fPIC -o s.so s.c

Thanks,

Mathieu

> 

> Thanks,

> Florian


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 14, 2019, 11:35 a.m. | #21
* Mathieu Desnoyers:

> * Makefile:

>

> LIBCPATH=/home/efficios/glibc-test/lib

> KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include

> CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH} -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2

>

> all:

> 	gcc ${CFLAGS} -o a a.c

> 	gcc ${CFLAGS} -shared -fPIC -o s.so s.c


For me, that does not correctly link against the built libc because the
system dynamic loader seeps into the link.

> specifically this commit:

> https://github.com/compudj/glibc-dev/commit/c49a286497d065a7fc00aafd846e6edce14f97fc


This commit links __rseq_handled into libc.so.6 via rseq-sym.c, but does
not export it from there.

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 12:55 p.m. | #22
----- On Jun 14, 2019, at 1:35 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> * Makefile:

>>

>> LIBCPATH=/home/efficios/glibc-test/lib

>> KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include

>> CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH}

>> -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2

>>

>> all:

>> 	gcc ${CFLAGS} -o a a.c

>> 	gcc ${CFLAGS} -shared -fPIC -o s.so s.c

> 

> For me, that does not correctly link against the built libc because the

> system dynamic loader seeps into the link.


I have the same issue. I tried adding "-B${LIBCPATH}" as well, but it did
not seem to help. I still have this ldd output:

ldd a
./a: /lib64/ld-linux-x86-64.so.2: version `GLIBC_2.30' not found (required by ./a)
	linux-vdso.so.1 (0x00007fffaa7e9000)
	libc.so.6 => /home/efficios/glibc-test/lib/libc.so.6 (0x00007fac5d479000)
	/home/efficios/glibc-test/lib/ld-linux-x86-64.so.2 => /lib64/ld-linux-x86-64.so.2 (0x00007fac5da33000)

Still no luck there. Any idea what compiler/linker flag I am missing ?

> 

>> specifically this commit:

>> https://github.com/compudj/glibc-dev/commit/c49a286497d065a7fc00aafd846e6edce14f97fc

> 

> This commit links __rseq_handled into libc.so.6 via rseq-sym.c, but does

> not export it from there.


Moving __rseq_handled to elf/dl-support.c and elf/rtld.c was part a commit on top.
I've force-pushed on the dev branch, and the commit moving __rseq_handled to the
dynamic linker it now appears as:
https://github.com/compudj/glibc-dev/commit/f0d4e60e5d0ceb0c2642f99da5af61b6ad988531

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Mathieu Desnoyers June 14, 2019, 1:01 p.m. | #23
----- On Jun 14, 2019, at 2:55 PM, Mathieu Desnoyers mathieu.desnoyers@efficios.com wrote:

> ----- On Jun 14, 2019, at 1:35 PM, Florian Weimer fweimer@redhat.com wrote:

> 

>> * Mathieu Desnoyers:

>> 

>>> * Makefile:

>>>

>>> LIBCPATH=/home/efficios/glibc-test/lib

>>> KERNEL_HEADERS=/home/efficios/git/linux-percpu-dev/usr/include

>>> CFLAGS=-I${KERNEL_HEADERS} -L${LIBCPATH} -Wl,--rpath=${LIBCPATH}

>>> -Wl,--dynamic-linker=${LIBCPATH}/ld-linux-x86-64.so.2

>>>

>>> all:

>>> 	gcc ${CFLAGS} -o a a.c

>>> 	gcc ${CFLAGS} -shared -fPIC -o s.so s.c

>> 

>> For me, that does not correctly link against the built libc because the

>> system dynamic loader seeps into the link.

> 

> I have the same issue. I tried adding "-B${LIBCPATH}" as well, but it did

> not seem to help. I still have this ldd output:

> 

> ldd a

> ./a: /lib64/ld-linux-x86-64.so.2: version `GLIBC_2.30' not found (required by

> ./a)

>	linux-vdso.so.1 (0x00007fffaa7e9000)

>	libc.so.6 => /home/efficios/glibc-test/lib/libc.so.6 (0x00007fac5d479000)

>	/home/efficios/glibc-test/lib/ld-linux-x86-64.so.2 =>

>	/lib64/ld-linux-x86-64.so.2 (0x00007fac5da33000)

> 

> Still no luck there. Any idea what compiler/linker flag I am missing ?

> 


Actually, even though ldd seems confused, running the program seems to
use the right ld.so:

efficios@compudjdev:~/test/libc-sym$ ./a
__rseq_handled main: 1 0x55f0ec915020
__rseq_abi.cpu_id main: 28 0x7f54f6c2d4c0
efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
__rseq_handled s.so: 1 0x557350bc6020
__rseq_abi.cpu_id s.so: -1 0x7fe2f30f2680
__rseq_handled main: 1 0x557350bc6020
__rseq_abi.cpu_id main: 27 0x7fe2f30f2680

But my original issue remains: if I define a variable called __rseq_handled
within either the main executable or the preloaded library, it overshadows
the libc one:

efficios@compudjdev:~/test/libc-sym$ ./a
__rseq_handled main: 0 0x56135fd5102c
__rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0
efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a
__rseq_handled s.so: 0 0x558f70aeb02c
__rseq_abi.cpu_id s.so: -1 0x7fdca78b7760
__rseq_handled main: 0 0x558f70aeb02c
__rseq_abi.cpu_id main: 27 0x7fdca78b7760

Which is unexpected.

This is with my dev branch at this commit:

https://github.com/compudj/glibc-dev/commit/f0d4e60e5d0ceb0c2642f99da5af61b6ad988531

What am I missing ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 14, 2019, 1:09 p.m. | #24
* Mathieu Desnoyers:

> But my original issue remains: if I define a variable called __rseq_handled

> within either the main executable or the preloaded library, it overshadows

> the libc one:

>

> efficios@compudjdev:~/test/libc-sym$ ./a

> __rseq_handled main: 0 0x56135fd5102c

> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0

> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a

> __rseq_handled s.so: 0 0x558f70aeb02c

> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760

> __rseq_handled main: 0 0x558f70aeb02c

> __rseq_abi.cpu_id main: 27 0x7fdca78b7760

>

> Which is unexpected.


Why is this unexpected?  It has to be this way if the main program uses
a copy relocation of __rseq_handled.  As long as there is just one
address across the entire program and ld.so initializes the copy of the
variable that is actually used, everything will be fine.

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 1:18 p.m. | #25
----- On Jun 14, 2019, at 3:09 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> But my original issue remains: if I define a variable called __rseq_handled

>> within either the main executable or the preloaded library, it overshadows

>> the libc one:

>>

>> efficios@compudjdev:~/test/libc-sym$ ./a

>> __rseq_handled main: 0 0x56135fd5102c

>> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0

>> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a

>> __rseq_handled s.so: 0 0x558f70aeb02c

>> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760

>> __rseq_handled main: 0 0x558f70aeb02c

>> __rseq_abi.cpu_id main: 27 0x7fdca78b7760

>>

>> Which is unexpected.

> 

> Why is this unexpected?  It has to be this way if the main program uses

> a copy relocation of __rseq_handled.  As long as there is just one

> address across the entire program and ld.so initializes the copy of the

> variable that is actually used, everything will be fine.


Here is a printout of the __rseq_handled address observed by ld.so, it
does not match:

LD_PRELOAD=./s.so ./a
elf: __rseq_handled addr: 7f501c98a140
__rseq_handled s.so: 0 0x55817a88d02c
__rseq_abi.cpu_id s.so: -1 0x7f501c983760
__rseq_handled main: 0 0x55817a88d02c
__rseq_abi.cpu_id main: 27 0x7f501c983760

This is with the following in a.c:

#include <stdio.h>
#include <linux/rseq.h>

__thread struct rseq __rseq_abi
__attribute__ ((tls_model ("initial-exec"))) = {
	.cpu_id = -1,
};
int __rseq_handled;

int main()
{
	fprintf(stderr, "__rseq_handled main: %d %p\n", __rseq_handled, &__rseq_handled);
	fprintf(stderr, "__rseq_abi.cpu_id main: %d %p\n", __rseq_abi.cpu_id, &__rseq_abi);
	return 0;
}

As we can see, the state of __rseq_handled observed by the preloaded
lib and the program is "0", but should really be "1". This can be
explained by ld.so not using the same address as the rest of the
program, but how can we fix that ?

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 14, 2019, 1:24 p.m. | #26
* Mathieu Desnoyers:

> ----- On Jun 14, 2019, at 3:09 PM, Florian Weimer fweimer@redhat.com wrote:

>

>> * Mathieu Desnoyers:

>> 

>>> But my original issue remains: if I define a variable called __rseq_handled

>>> within either the main executable or the preloaded library, it overshadows

>>> the libc one:

>>>

>>> efficios@compudjdev:~/test/libc-sym$ ./a

>>> __rseq_handled main: 0 0x56135fd5102c

>>> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0

>>> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a

>>> __rseq_handled s.so: 0 0x558f70aeb02c

>>> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760

>>> __rseq_handled main: 0 0x558f70aeb02c

>>> __rseq_abi.cpu_id main: 27 0x7fdca78b7760

>>>

>>> Which is unexpected.

>> 

>> Why is this unexpected?  It has to be this way if the main program uses

>> a copy relocation of __rseq_handled.  As long as there is just one

>> address across the entire program and ld.so initializes the copy of the

>> variable that is actually used, everything will be fine.

>

> Here is a printout of the __rseq_handled address observed by ld.so, it

> does not match:

>

> LD_PRELOAD=./s.so ./a

> elf: __rseq_handled addr: 7f501c98a140

> __rseq_handled s.so: 0 0x55817a88d02c

> __rseq_abi.cpu_id s.so: -1 0x7f501c983760

> __rseq_handled main: 0 0x55817a88d02c

> __rseq_abi.cpu_id main: 27 0x7f501c983760


Where do you print the address?  Before or after the self-relocation of
the dynamic loader?  The address is only correct after self-relocation.

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 1:34 p.m. | #27
----- On Jun 14, 2019, at 3:24 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> ----- On Jun 14, 2019, at 3:09 PM, Florian Weimer fweimer@redhat.com wrote:

>>

>>> * Mathieu Desnoyers:

>>> 

>>>> But my original issue remains: if I define a variable called __rseq_handled

>>>> within either the main executable or the preloaded library, it overshadows

>>>> the libc one:

>>>>

>>>> efficios@compudjdev:~/test/libc-sym$ ./a

>>>> __rseq_handled main: 0 0x56135fd5102c

>>>> __rseq_abi.cpu_id main: 29 0x7fcbeca6d5a0

>>>> efficios@compudjdev:~/test/libc-sym$ LD_PRELOAD=./s.so ./a

>>>> __rseq_handled s.so: 0 0x558f70aeb02c

>>>> __rseq_abi.cpu_id s.so: -1 0x7fdca78b7760

>>>> __rseq_handled main: 0 0x558f70aeb02c

>>>> __rseq_abi.cpu_id main: 27 0x7fdca78b7760

>>>>

>>>> Which is unexpected.

>>> 

>>> Why is this unexpected?  It has to be this way if the main program uses

>>> a copy relocation of __rseq_handled.  As long as there is just one

>>> address across the entire program and ld.so initializes the copy of the

>>> variable that is actually used, everything will be fine.

>>

>> Here is a printout of the __rseq_handled address observed by ld.so, it

>> does not match:

>>

>> LD_PRELOAD=./s.so ./a

>> elf: __rseq_handled addr: 7f501c98a140

>> __rseq_handled s.so: 0 0x55817a88d02c

>> __rseq_abi.cpu_id s.so: -1 0x7f501c983760

>> __rseq_handled main: 0 0x55817a88d02c

>> __rseq_abi.cpu_id main: 27 0x7f501c983760

> 

> Where do you print the address?  Before or after the self-relocation of

> the dynamic loader?  The address is only correct after self-relocation.


I printed the address within rseq_init (), which happened to be invoked
by the linker startup waaaay too early. I followed your advice and moved
the rseq_init () invocation after linker re-relocation:

diff --git a/elf/rtld.c b/elf/rtld.c
index f29f284a7c..66b0894f9d 100644
--- a/elf/rtld.c
+++ b/elf/rtld.c
@@ -1410,9 +1410,6 @@ ERROR: '%s': cannot process note segment.\n", _dl_argv[0]);
     /* Assign a module ID.  Do this before loading any audit modules.  */
     GL(dl_rtld_map).l_tls_modid = _dl_next_tls_modid ();
 
-  /* Publicize rseq registration ownership.  */
-  rseq_init ();
-
   /* If we have auditing DSOs to load, do it now.  */
   bool need_security_init = true;
   if (__glibc_unlikely (audit_list != NULL)
@@ -2284,6 +2281,11 @@ ERROR: ld.so: object '%s' cannot be loaded as audit interface: %s; ignored.\n",
       HP_TIMING_ACCUM_NT (relocate_time, add);
     }
 
+  /* Publicize rseq registration ownership.  This must be performed
+     after rtld re-relocation, before invoking constructors of
+     preloaded libraries.  */
+  rseq_init ();
+
   /* Do any necessary cleanups for the startup OS interface code.
      We do these now so that no calls are made after rtld re-relocation
      which might be resolved to different functions than we expect.

It works fine now!

LD_PRELOAD=./s.so ./a
elf: __rseq_handled addr: 56300f0a402c
__rseq_handled s.so: 1 0x56300f0a402c
__rseq_abi.cpu_id s.so: -1 0x7fad2ff58760
__rseq_handled main: 1 0x56300f0a402c
__rseq_abi.cpu_id main: 27 0x7fad2ff58760

Thanks!

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Mathieu Desnoyers June 14, 2019, 1:39 p.m. | #28
----- On Jun 14, 2019, at 3:29 PM, David Laight David.Laight@ACULAB.COM wrote:

> From: Mathieu Desnoyers

>> Sent: 14 June 2019 14:02

> ...

>> But my original issue remains: if I define a variable called __rseq_handled

>> within either the main executable or the preloaded library, it overshadows

>> the libc one:

> 

> 1) That is the was elf symbol resolution is required to work.

>   Otherwise variables like 'errno' (non-thread safe form) wouldn't work.

> 

> 2) Don't do it then :-)

>   Names starting with __ will be reserved (probably 'for the implementation').

> 

> The real 'fun' starts because, under some circumstances, looking up a symbol as:

>	foo = dlsym(lib_handle, "foo");

> Can find the data item instead of the function!

> Usually it works (even when foo is global data) because 'lib_handle' refers

> to a different symbol table.

> But it can go horribly wrong.


I was setting __rseq_handled too soon, before re-relocation of the dynamic linker.
I moved the initialization after re-relocation and it works fine now.

The purpose of __rseq_handled is to allow early adopter libraries and applications
to define their own global instance of the symbol, and check whether the libc
they are linked against handle rseq registration or not.

libc specifies the layout of that variable (an integer). The dynamic linker
chooses one of those instances so it's used in the global symbol table of the
program. The important thing is that all libraries agree on that global symbol.
Of course this is not compatible with libraries compiled with forced "hidden"
symbols only.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 14, 2019, 1:42 p.m. | #29
* Mathieu Desnoyers:

> +  /* Publicize rseq registration ownership.  This must be performed

> +     after rtld re-relocation, before invoking constructors of

> +     preloaded libraries.  */

> +  rseq_init ();


Please add a comment that IFUNC resolvers do not see the initialized
value.  I think this is okay because we currently do not support access
to extern variables in IFUNC resolvers.

>    /* Do any necessary cleanups for the startup OS interface code.

>       We do these now so that no calls are made after rtld re-relocation

>       which might be resolved to different functions than we expect.

>

> It works fine now!


Great.

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 1:47 p.m. | #30
----- On Jun 14, 2019, at 3:42 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> +  /* Publicize rseq registration ownership.  This must be performed

>> +     after rtld re-relocation, before invoking constructors of

>> +     preloaded libraries.  */

>> +  rseq_init ();

> 

> Please add a comment that IFUNC resolvers do not see the initialized

> value.  I think this is okay because we currently do not support access

> to extern variables in IFUNC resolvers.


Do IFUNC resolvers happen to observe the __rseq_handled address that
was internal to ld.so ?

If so, we could simply initialize __rseq_handled twice: early before calling
IFUNC resolvers, and after ld.so re-relocation.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com
Florian Weimer June 14, 2019, 1:53 p.m. | #31
* Mathieu Desnoyers:

> ----- On Jun 14, 2019, at 3:42 PM, Florian Weimer fweimer@redhat.com wrote:

>

>> * Mathieu Desnoyers:

>> 

>>> +  /* Publicize rseq registration ownership.  This must be performed

>>> +     after rtld re-relocation, before invoking constructors of

>>> +     preloaded libraries.  */

>>> +  rseq_init ();

>> 

>> Please add a comment that IFUNC resolvers do not see the initialized

>> value.  I think this is okay because we currently do not support access

>> to extern variables in IFUNC resolvers.

>

> Do IFUNC resolvers happen to observe the __rseq_handled address that

> was internal to ld.so ?


They should observe the correct address, but they can access the
variable before initialization.  An initializer in ld.so will not have
an effect if an interposed definition initalized the variable to
something else.

> If so, we could simply initialize __rseq_handled twice: early before calling

> IFUNC resolvers, and after ld.so re-relocation.


No, I don't think this will make a difference.

Thanks,
Florian
Mathieu Desnoyers June 14, 2019, 1:59 p.m. | #32
----- On Jun 14, 2019, at 3:53 PM, Florian Weimer fweimer@redhat.com wrote:

> * Mathieu Desnoyers:

> 

>> ----- On Jun 14, 2019, at 3:42 PM, Florian Weimer fweimer@redhat.com wrote:

>>

>>> * Mathieu Desnoyers:

>>> 

>>>> +  /* Publicize rseq registration ownership.  This must be performed

>>>> +     after rtld re-relocation, before invoking constructors of

>>>> +     preloaded libraries.  */

>>>> +  rseq_init ();

>>> 

>>> Please add a comment that IFUNC resolvers do not see the initialized

>>> value.  I think this is okay because we currently do not support access

>>> to extern variables in IFUNC resolvers.

>>

>> Do IFUNC resolvers happen to observe the __rseq_handled address that

>> was internal to ld.so ?

> 

> They should observe the correct address, but they can access the

> variable before initialization.  An initializer in ld.so will not have

> an effect if an interposed definition initalized the variable to

> something else.

> 

>> If so, we could simply initialize __rseq_handled twice: early before calling

>> IFUNC resolvers, and after ld.so re-relocation.

> 

> No, I don't think this will make a difference.


So comment it is:

  /* Publicize rseq registration ownership.  This must be performed
     after rtld re-relocation, before invoking constructors of
     preloaded libraries. IFUNC resolvers are called before this
     initialization, so they may not observe the initialized state.  */
  rseq_init ();

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Patch

diff --git a/ChangeLog b/ChangeLog
index 59dab18463..459af8f1a5 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,50 @@ 
+2019-04-23  Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
+
+	* NEWS: Add Restartable Sequences feature description.
+	* csu/libc-start.c: Perform rseq(2) registration at C startup and
+	thread creation.
+	* nptl/pthread_create.c: Likewise.
+	* sysdeps/unix/sysv/linux/Makefile: Add rseq-sym, sys/rseq.h,
+	bits/rseq.h.
+	* sysdeps/unix/sysv/linux/Versions: Export __rseq_abi and
+	__rseq_handled from libc.
+	* sysdeps/unix/sysv/linux/aarch64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/alpha/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/arm/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/csky/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/hppa/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/i386/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/ia64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/microblaze/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/nios2/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist:
+	Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sh/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/64/libc.abilist: Likewise.
+	* sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist: Likewise.
+	* misc/rseq-internal.h: New file.
+	* sysdeps/unix/sysv/linux/rseq-internal.h: Likewise.
+	* sysdeps/unix/sysv/linux/rseq-sym.c: Likewise.
+	* sysdeps/unix/sysv/linux/sys/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/aarch64/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/s390/bits/rseq.h: Likewise.
+	* sysdeps/unix/sysv/linux/x86/bits/rseq.h: Likewise.
+
 2019-01-31  Siddhesh Poyarekar  <siddhesh@sourceware.org>
 
 	* version.h (RELEASE): Set to "stable".
diff --git a/NEWS b/NEWS
index 912a9bdc0f..7276a09b08 100644
--- a/NEWS
+++ b/NEWS
@@ -5,6 +5,21 @@  See the end for copying conditions.
 Please send GNU C library bug reports via <https://sourceware.org/bugzilla/>
 using `glibc' in the "product" field.
 
+Version 2.30
+
+Major new features:
+
+* Support for automatically registering threads with the Linux rseq(2)
+  system call has been added.  This system call is implemented starting
+  from Linux 4.18.  The Restartable Sequences ABI accelerates user-space
+  operations on per-cpu data.  It allows user-space to perform updates
+  on per-cpu data without requiring heavy-weight atomic operations.
+  Automatically registering threads allows all libraries, including libc,
+  to make immediate use of the rseq(2) support by using the documented ABI.
+  See 'man 2 rseq' for the details of the ABI shared between libc and the
+  kernel.
+
+
 Version 2.29
 
 Major new features:
diff --git a/csu/libc-start.c b/csu/libc-start.c
index 5d9c3675fa..e101196b0d 100644
--- a/csu/libc-start.c
+++ b/csu/libc-start.c
@@ -22,6 +22,7 @@ 
 #include <ldsodefs.h>
 #include <exit-thread.h>
 #include <libc-internal.h>
+#include <rseq-internal.h>
 
 #include <elf/dl-tunables.h>
 
@@ -140,7 +141,12 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 
   __libc_multiple_libcs = &_dl_starting_up && !_dl_starting_up;
 
-#ifndef SHARED
+  rseq_init ();
+
+#ifdef SHARED
+  /* Register rseq ABI to the kernel. */
+  (void) rseq_register_current_thread ();
+#else
   _dl_relocate_static_pie ();
 
   char **ev = &argv[argc + 1];
@@ -218,6 +224,9 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
     }
 # endif
 
+  /* Register rseq ABI to the kernel. */
+  (void) rseq_register_current_thread ();
+
   /* Initialize libpthread if linked in.  */
   if (__pthread_initialize_minimal != NULL)
     __pthread_initialize_minimal ();
@@ -230,8 +239,7 @@  LIBC_START_MAIN (int (*main) (int, char **, char ** MAIN_AUXVEC_DECL),
 # else
   __pointer_chk_guard_local = pointer_chk_guard;
 # endif
-
-#endif /* !SHARED  */
+#endif
 
   /* Register the destructor of the dynamic linker if there is any.  */
   if (__glibc_likely (rtld_fini != NULL))
diff --git a/misc/rseq-internal.h b/misc/rseq-internal.h
new file mode 100644
index 0000000000..ccad30bca5
--- /dev/null
+++ b/misc/rseq-internal.h
@@ -0,0 +1,38 @@ 
+/* Restartable Sequences internal API. Stub version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_init (void)
+{
+}
+
+#endif /* rseq-internal.h */
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 2bd2b10727..90b3419390 100644
--- a/nptl/pthread_create.c
+++ b/nptl/pthread_create.c
@@ -33,6 +33,7 @@ 
 #include <default-sched.h>
 #include <futex-internal.h>
 #include <tls-setup.h>
+#include <rseq-internal.h>
 #include "libioP.h"
 
 #include <shlib-compat.h>
@@ -378,6 +379,7 @@  __free_tcb (struct pthread *pd)
 START_THREAD_DEFN
 {
   struct pthread *pd = START_THREAD_SELF;
+  bool has_rseq = false;
 
 #if HP_TIMING_AVAIL
   /* Remember the time when the thread was started.  */
@@ -396,6 +398,9 @@  START_THREAD_DEFN
   if (__glibc_unlikely (atomic_exchange_acq (&pd->setxid_futex, 0) == -2))
     futex_wake (&pd->setxid_futex, 1, FUTEX_PRIVATE);
 
+  /* Register rseq TLS to the kernel. */
+  has_rseq = !rseq_register_current_thread ();
+
 #ifdef __NR_set_robust_list
 # ifndef __ASSUME_SET_ROBUST_LIST
   if (__set_robust_list_avail >= 0)
@@ -573,6 +578,10 @@  START_THREAD_DEFN
     }
 #endif
 
+  /* Unregister rseq TLS from kernel. */
+  if (has_rseq && rseq_unregister_current_thread ())
+    abort();
+
   advise_stack_range (pd->stackblock, pd->stackblock_size, (uintptr_t) pd,
 		      pd->guardsize);
 
diff --git a/sysdeps/unix/sysv/linux/Makefile b/sysdeps/unix/sysv/linux/Makefile
index 5f8c2c7c7d..5b541469ec 100644
--- a/sysdeps/unix/sysv/linux/Makefile
+++ b/sysdeps/unix/sysv/linux/Makefile
@@ -1,5 +1,5 @@ 
 ifeq ($(subdir),csu)
-sysdep_routines += errno-loc
+sysdep_routines += errno-loc rseq-sym
 endif
 
 ifeq ($(subdir),assert)
@@ -48,7 +48,7 @@  sysdep_headers += sys/mount.h sys/acct.h sys/sysctl.h \
 		  bits/termios-c_iflag.h bits/termios-c_oflag.h \
 		  bits/termios-baud.h bits/termios-c_cflag.h \
 		  bits/termios-c_lflag.h bits/termios-tcflow.h \
-		  bits/termios-misc.h
+		  bits/termios-misc.h sys/rseq.h bits/rseq.h
 
 tests += tst-clone tst-clone2 tst-clone3 tst-fanotify tst-personality \
 	 tst-quota tst-sync_file_range tst-sysconf-iov_max tst-ttyname \
diff --git a/sysdeps/unix/sysv/linux/Versions b/sysdeps/unix/sysv/linux/Versions
index f1e12d9c69..bee3d727e5 100644
--- a/sysdeps/unix/sysv/linux/Versions
+++ b/sysdeps/unix/sysv/linux/Versions
@@ -174,6 +174,10 @@  libc {
   GLIBC_2.29 {
     getcpu;
   }
+  GLIBC_2.30 {
+    __rseq_abi;
+    __rseq_handled;
+  }
   GLIBC_PRIVATE {
     # functions used in other libraries
     __syscall_rt_sigqueueinfo;
diff --git a/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
new file mode 100644
index 0000000000..35fcc41f1e
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/aarch64/bits/rseq.h
@@ -0,0 +1,43 @@ 
+/* Restartable Sequences Linux aarch64 architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.
+
+   aarch64 -mbig-endian generates mixed endianness code vs data:
+   little-endian code and big-endian data. Ensure the RSEQ_SIG signature
+   matches code endianness.  */
+
+#define RSEQ_SIG_CODE	0xd428bc00	/* BRK #0x45E0.  */
+
+#ifdef __AARCH64EB__
+#define RSEQ_SIG_DATA	0x00bc28d4	/* BRK #0x45E0.  */
+#else
+#define RSEQ_SIG_DATA	RSEQ_SIG_CODE
+#endif
+
+#define RSEQ_SIG	RSEQ_SIG_DATA
diff --git a/sysdeps/unix/sysv/linux/aarch64/libc.abilist b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
index 9c330f325e..331f39e41a 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libc.abilist
@@ -2141,3 +2141,5 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/alpha/libc.abilist b/sysdeps/unix/sysv/linux/alpha/libc.abilist
index f630fa4c6f..05dfdd3393 100644
--- a/sysdeps/unix/sysv/linux/alpha/libc.abilist
+++ b/sysdeps/unix/sysv/linux/alpha/libc.abilist
@@ -2204,6 +2204,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/arm/libc.abilist b/sysdeps/unix/sysv/linux/arm/libc.abilist
index b96f45590f..24e9b89a50 100644
--- a/sysdeps/unix/sysv/linux/arm/libc.abilist
+++ b/sysdeps/unix/sysv/linux/arm/libc.abilist
@@ -126,6 +126,8 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0xa0
 GLIBC_2.4 _IO_2_1_stdin_ D 0xa0
diff --git a/sysdeps/unix/sysv/linux/bits/rseq.h b/sysdeps/unix/sysv/linux/bits/rseq.h
new file mode 100644
index 0000000000..a3c023f5c7
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/bits/rseq.h
@@ -0,0 +1,29 @@ 
+/* Restartable Sequences architecture header. Stub version.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   It is a 32-bit value that maps to actual architecture code compiled
+   into applications and libraries. It needs to be defined for each
+   architecture. When choosing this value, it needs to be taken into
+   account that generating invalid instructions may have ill effects on
+   tools like objdump, and may also have impact on the CPU speculative
+   execution efficiency in some cases.  */
diff --git a/sysdeps/unix/sysv/linux/csky/libc.abilist b/sysdeps/unix/sysv/linux/csky/libc.abilist
index 019044c3cd..e2b0538088 100644
--- a/sysdeps/unix/sysv/linux/csky/libc.abilist
+++ b/sysdeps/unix/sysv/linux/csky/libc.abilist
@@ -2085,3 +2085,5 @@  GLIBC_2.29 xdrstdio_create F
 GLIBC_2.29 xencrypt F
 GLIBC_2.29 xprt_register F
 GLIBC_2.29 xprt_unregister F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/hppa/libc.abilist b/sysdeps/unix/sysv/linux/hppa/libc.abilist
index 088a8ee369..263a91b97e 100644
--- a/sysdeps/unix/sysv/linux/hppa/libc.abilist
+++ b/sysdeps/unix/sysv/linux/hppa/libc.abilist
@@ -2037,6 +2037,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/i386/libc.abilist b/sysdeps/unix/sysv/linux/i386/libc.abilist
index f7ff2c57b9..18ce09d48a 100644
--- a/sysdeps/unix/sysv/linux/i386/libc.abilist
+++ b/sysdeps/unix/sysv/linux/i386/libc.abilist
@@ -2203,6 +2203,8 @@  GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 vm86 F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/ia64/libc.abilist b/sysdeps/unix/sysv/linux/ia64/libc.abilist
index becd8b1033..b61e2ee010 100644
--- a/sysdeps/unix/sysv/linux/ia64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/ia64/libc.abilist
@@ -2069,6 +2069,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
index 74e42a5209..e55792bb22 100644
--- a/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/coldfire/libc.abilist
@@ -127,6 +127,8 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _Exit F
 GLIBC_2.4 _IO_2_1_stderr_ D 0x98
 GLIBC_2.4 _IO_2_1_stdin_ D 0x98
diff --git a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
index 4af5a74e8a..9845499048 100644
--- a/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
+++ b/sysdeps/unix/sysv/linux/m68k/m680x0/libc.abilist
@@ -2146,6 +2146,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/microblaze/libc.abilist b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
index ccef673fd2..1aba8cb86c 100644
--- a/sysdeps/unix/sysv/linux/microblaze/libc.abilist
+++ b/sysdeps/unix/sysv/linux/microblaze/libc.abilist
@@ -2133,3 +2133,5 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
index 1054bb599e..df54e2adab 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/fpu/libc.abilist
@@ -2120,6 +2120,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
index 4f5b5ffebf..ce95ae7e86 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips32/nofpu/libc.abilist
@@ -2118,6 +2118,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
index 943aee58d4..c9fb5d2096 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n32/libc.abilist
@@ -2126,6 +2126,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
index 17a5d17ef9..6335df9acf 100644
--- a/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/mips/mips64/n64/libc.abilist
@@ -2120,6 +2120,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/nios2/libc.abilist b/sysdeps/unix/sysv/linux/nios2/libc.abilist
index 4d62a540fd..5465b96768 100644
--- a/sysdeps/unix/sysv/linux/nios2/libc.abilist
+++ b/sysdeps/unix/sysv/linux/nios2/libc.abilist
@@ -2174,3 +2174,5 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
index ecc2d6fa13..eb3808dbd4 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/fpu/libc.abilist
@@ -2164,6 +2164,8 @@  GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
index f5830f9c33..6a49a7b718 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc32/nofpu/libc.abilist
@@ -2197,6 +2197,8 @@  GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
index 633d8f4792..83177dc75f 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/be/libc.abilist
@@ -2027,6 +2027,8 @@  GLIBC_2.3.4 siglongjmp F
 GLIBC_2.3.4 swapcontext F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
index 2c712636ef..e714de994c 100644
--- a/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
+++ b/sysdeps/unix/sysv/linux/powerpc/powerpc64/le/libc.abilist
@@ -2231,3 +2231,5 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
index 195bc8b2cf..d190623993 100644
--- a/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/riscv/rv64/libc.abilist
@@ -2103,3 +2103,5 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
diff --git a/sysdeps/unix/sysv/linux/rseq-internal.h b/sysdeps/unix/sysv/linux/rseq-internal.h
new file mode 100644
index 0000000000..edb31b1c3c
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-internal.h
@@ -0,0 +1,88 @@ 
+/* Restartable Sequences internal API. Linux implementation.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef RSEQ_INTERNAL_H
+#define RSEQ_INTERNAL_H
+
+#include <sysdep.h>
+#include <errno.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#endif
+
+#if defined __NR_rseq && defined RSEQ_SIG
+
+static inline int
+rseq_register_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  if (__rseq_abi.cpu_id == RSEQ_CPU_ID_REGISTRATION_FAILED)
+    return -1;
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              0, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  if (INTERNAL_SYSCALL_ERRNO (rc, err) != EBUSY)
+    __rseq_abi.cpu_id = RSEQ_CPU_ID_REGISTRATION_FAILED;
+  ret = -1;
+end:
+  return ret;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  int rc, ret = 0;
+  INTERNAL_SYSCALL_DECL (err);
+
+  rc = INTERNAL_SYSCALL_CALL (rseq, err, &__rseq_abi, sizeof (struct rseq),
+                              RSEQ_FLAG_UNREGISTER, RSEQ_SIG);
+  if (!rc)
+    goto end;
+  ret = -1;
+end:
+  return ret;
+}
+
+static inline void
+rseq_init (void)
+{
+  __rseq_handled = 1;
+}
+#else
+static inline int
+rseq_register_current_thread (void)
+{
+  return -1;
+}
+
+static inline int
+rseq_unregister_current_thread (void)
+{
+  return -1;
+}
+
+static inline void
+rseq_init (void)
+{
+}
+#endif
+
+#endif /* rseq-internal.h */
diff --git a/sysdeps/unix/sysv/linux/rseq-sym.c b/sysdeps/unix/sysv/linux/rseq-sym.c
new file mode 100644
index 0000000000..8e3abab3d0
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/rseq-sym.c
@@ -0,0 +1,63 @@ 
+/* Restartable Sequences exported symbols. Linux Implementation.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/syscall.h>
+#include <stdint.h>
+
+#ifdef __NR_rseq
+#include <sys/rseq.h>
+#else
+
+enum rseq_cpu_id_state {
+  RSEQ_CPU_ID_UNINITIALIZED = -1,
+  RSEQ_CPU_ID_REGISTRATION_FAILED = -2,
+};
+
+/* linux/rseq.h defines struct rseq as aligned on 32 bytes. The kernel ABI
+   size is 20 bytes.  */
+struct rseq {
+  uint32_t cpu_id_start;
+  uint32_t cpu_id;
+  uint64_t rseq_cs;
+  uint32_t flags;
+} __attribute__ ((aligned(4 * sizeof(uint64_t))));
+
+#endif
+
+/* volatile because fields can be read/updated by the kernel.  */
+__thread volatile struct rseq __rseq_abi = {
+  .cpu_id = RSEQ_CPU_ID_UNINITIALIZED,
+};
+
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.  */
+
+int __rseq_handled;
diff --git a/sysdeps/unix/sysv/linux/s390/bits/rseq.h b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
new file mode 100644
index 0000000000..0ed16c23a4
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/s390/bits/rseq.h
@@ -0,0 +1,30 @@ 
+/* Restartable Sequences Linux s390 architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   RSEQ_SIG uses the trap4 instruction. As Linux does not make use of the
+   access-register mode nor the linkage stack this instruction will always
+   cause a special-operation exception (the trap-enabled bit in the DUCT
+   is and will stay 0). The instruction pattern is
+	b2 ff 0f ff	trap4   4095(%r0)  */
+
+#define RSEQ_SIG	0xB2FF0FFF
diff --git a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
index 334def033c..dacae17ec4 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-32/libc.abilist
@@ -2159,6 +2159,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
index 536f4c4ced..c277b3bd90 100644
--- a/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/s390/s390-64/libc.abilist
@@ -2063,6 +2063,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sh/libc.abilist b/sysdeps/unix/sysv/linux/sh/libc.abilist
index 30ae3b6ebb..5f70e5c53b 100644
--- a/sysdeps/unix/sysv/linux/sh/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sh/libc.abilist
@@ -2041,6 +2041,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
index 68b107d080..537da009d3 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc32/libc.abilist
@@ -2153,6 +2153,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 _IO_fprintf F
 GLIBC_2.4 _IO_printf F
 GLIBC_2.4 _IO_sprintf F
diff --git a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
index e5b6a4da50..1fee8e34fc 100644
--- a/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/sparc/sparc64/libc.abilist
@@ -2092,6 +2092,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/sys/rseq.h b/sysdeps/unix/sysv/linux/sys/rseq.h
new file mode 100644
index 0000000000..5698f4a96d
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/sys/rseq.h
@@ -0,0 +1,50 @@ 
+/* Restartable Sequences exported symbols. Linux header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+#define _SYS_RSEQ_H	1
+
+/* We use the structures declarations from the kernel headers.  */
+#include <linux/rseq.h>
+/* Architecture-specific rseq signature.  */
+#include <bits/rseq.h>
+#include <stdint.h>
+
+/* volatile because fields can be read/updated by the kernel.  */
+extern __thread volatile struct rseq __rseq_abi
+__attribute__ ((tls_model ("initial-exec")));
+
+/* Advertise Restartable Sequences registration ownership across
+   application and shared libraries.
+
+   Libraries and applications must check whether this variable is zero or
+   non-zero if they wish to perform rseq registration on their own. If it
+   is zero, it means restartable sequence registration is not handled, and
+   the library or application is free to perform rseq registration. In
+   that case, the library or application is taking ownership of rseq
+   registration, and may set __rseq_handled to 1. It may then set it back
+   to 0 after it completes unregistering rseq.
+
+   If __rseq_handled is found to be non-zero, it means that another
+   library (or the application) is currently handling rseq registration.
+
+   Typical use of __rseq_handled is within library constructors and
+   destructors, or at program startup.  */
+
+extern int __rseq_handled;
+
+#endif /* sys/rseq.h */
diff --git a/sysdeps/unix/sysv/linux/x86/bits/rseq.h b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
new file mode 100644
index 0000000000..a2918c4617
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/x86/bits/rseq.h
@@ -0,0 +1,30 @@ 
+/* Restartable Sequences Linux x86 architecture header.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _SYS_RSEQ_H
+# error "Never use <bits/rseq.h> directly; include <sys/rseq.h> instead."
+#endif
+
+/* RSEQ_SIG is a signature required before each abort handler code.
+
+   RSEQ_SIG is used with the following reserved undefined instructions, which
+   trap in user-space:
+
+   x86-32:    0f b9 3d 53 30 05 53      ud1    0x53053053,%edi
+   x86-64:    0f b9 3d 53 30 05 53      ud1    0x53053053(%rip),%edi  */
+
+#define RSEQ_SIG	0x53053053
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
index 86dfb0c94d..a834f65383 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libc.abilist
@@ -2050,6 +2050,8 @@  GLIBC_2.3.4 setipv4sourcefilter F
 GLIBC_2.3.4 setsourcefilter F
 GLIBC_2.3.4 xdr_quad_t F
 GLIBC_2.3.4 xdr_u_quad_t F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4
 GLIBC_2.4 __confstr_chk F
 GLIBC_2.4 __fgets_chk F
 GLIBC_2.4 __fgets_unlocked_chk F
diff --git a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
index dd688263aa..fb8417bde7 100644
--- a/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/x32/libc.abilist
@@ -2149,3 +2149,5 @@  GLIBC_2.28 thrd_yield F
 GLIBC_2.29 getcpu F
 GLIBC_2.29 posix_spawn_file_actions_addchdir_np F
 GLIBC_2.29 posix_spawn_file_actions_addfchdir_np F
+GLIBC_2.30 __rseq_abi T 0x20
+GLIBC_2.30 __rseq_handled D 0x4