doc: Clarify __builtin_return_address [PR94891]

Message ID 20200528163541.7262-1-szabolcs.nagy@arm.com
State Superseded
Headers show
Series
  • doc: Clarify __builtin_return_address [PR94891]
Related show

Commit Message

Szabolcs Nagy May 28, 2020, 4:35 p.m.
The expected semantics and valid usage of __builtin_return_address is
not clear since it exposes implementation internals that are normally
not meaningful to portable c code.

This documentation change tries to clarify the semantics in case the
return address is stored in a mangled form in memory which affects
AArch64 when pointer authentication is used for the return address
(i.e. -mbranch-protection=pac-ret).

---

This is an RFC patch trying to address PR target/94891:

AArch64 __builtin_return_address is currently returning the mangled
address even though user code cannot generally use such address or
tell if it is mangled. (So this patch will require aarch64 backend
changes.)

__builtin_extract_return_addr returns its argument unchanged on
AArch64. This can be changed but the assumption that this operation
can be reversed by __builtin_frob_return_addr makes it unsuitable
for general unmangling (return address signing requires additional
input other than the code address).

On AArch64 the return address mangling is ABI between the compiler
and unwinder / debugger: the unwind / debug info describes when and
how to unmangle the return address. This information may not be
available at runtime (e.g. without unwind tables) so user code cannot
handle a mangled return address in general. Currently the xpaclri
instruction always works and gives an unmangled address, but exposing
the mangled address to users means breaking existing code using
__builtin_return_address and constrains the mangling ABI.

On AArch64 with ILP32 ABI the return address is stored as 64bit in
memory but __builtin_return_address returns 32bit void* so it cannot
be the same as the stored value if the top bits are used for mangling.

It seems only the
  __builtin_extract_return_addr (__builtin_return_address (0))
usage was ever useful in portable code so i think this should be
documented and otherwise leave the semantics to the target to decide.
---
 gcc/doc/extend.texi | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

-- 
2.17.1

Comments

Kees Cook via Gcc-patches May 28, 2020, 4:50 p.m. | #1
* Szabolcs Nagy:

> diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi

> index cced19d2018..0fd32a22599 100644

> --- a/gcc/doc/extend.texi

> +++ b/gcc/doc/extend.texi


>  On some machines it may be impossible to determine the return address of

>  any function other than the current one; in such cases, or when the top

> +of the stack has been reached, this function returns an unspecified

> +value.


Can it crash as well?  But that's a pre-existing issue with the wording.

>  Additional post-processing of the returned value may be needed, see

>  @code{__builtin_extract_return_addr}.

>  

> +The stored representation of the return address in memory may be different

> +from the address returned by @code{__builtin_return_address}.  For example

> +on AArch64 the stored address may be mangled with return address signing.

> +

>  Calling this function with a nonzero argument can have unpredictable

>  effects, including crashing the calling program.  As a result, calls

>  that are considered unsafe are diagnosed when the @option{-Wframe-address}

>  option is in effect.  Such calls should only be made in debugging

>  situations.

> +

> +On targets where code addresses are representable as @code{void *},

> +@smallexample

> +void *addr = __builtin_extract_return_addr (__builtin_return_address (0))

> +@end smallexample

> +gives the code address where the current function would return.  For example

> +such address may be used with @code{dladdr} or other interfaces that work

> +with code addresses.

>  @end deftypefn


The change looks reasonable to me.  It is worded in such a way that it
covers architectures which use function descriptors.

Thanks,
Florian

Patch

diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
index cced19d2018..0fd32a22599 100644
--- a/gcc/doc/extend.texi
+++ b/gcc/doc/extend.texi
@@ -11151,18 +11151,30 @@  The @var{level} argument must be a constant integer.
 
 On some machines it may be impossible to determine the return address of
 any function other than the current one; in such cases, or when the top
-of the stack has been reached, this function returns @code{0} or a
-random value.  In addition, @code{__builtin_frame_address} may be used
+of the stack has been reached, this function returns an unspecified
+value.  In addition, @code{__builtin_frame_address} may be used
 to determine if the top of the stack has been reached.
 
 Additional post-processing of the returned value may be needed, see
 @code{__builtin_extract_return_addr}.
 
+The stored representation of the return address in memory may be different
+from the address returned by @code{__builtin_return_address}.  For example
+on AArch64 the stored address may be mangled with return address signing.
+
 Calling this function with a nonzero argument can have unpredictable
 effects, including crashing the calling program.  As a result, calls
 that are considered unsafe are diagnosed when the @option{-Wframe-address}
 option is in effect.  Such calls should only be made in debugging
 situations.
+
+On targets where code addresses are representable as @code{void *},
+@smallexample
+void *addr = __builtin_extract_return_addr (__builtin_return_address (0))
+@end smallexample
+gives the code address where the current function would return.  For example
+such address may be used with @code{dladdr} or other interfaces that work
+with code addresses.
 @end deftypefn
 
 @deftypefn {Built-in Function} {void *} __builtin_extract_return_addr (void *@var{addr})