as: How to determine the section of symbols

Message ID 20190715144844.GJ3419@hirez.programming.kicks-ass.net
State New
Headers show
Series
  • as: How to determine the section of symbols
Related show

Commit Message

Peter Zijlstra July 15, 2019, 2:48 p.m.
Hi all,

I'm trying to 'optimize' the Linux Kernel x86 jump_label support. That
is; jump_label or static_branch() is a Linux Kernel construct build upon
GCC asm goto and provides for self-modifying code based branches.
Regular execution will only see unconditional branches or nops.

Currently, on x86, we patch between jmp.d32 or nop5 and this works well.

The quest is to also allow usage of jmp.d8 (and the matching nop2) where
the displacement allows for this.

The below patch is the last of a series that implements this and
contains all the relevant bits to this discussion, and is subtly broken.

The problem is that the labels GCC hands to the asm goto () can be in
different sections (namely .text and .text.unlikely), and the GAS manual
sayeth:

 [ https://sourceware.org/binutils/docs-2.27/as/Infix-Ops.html#Infix-Ops ]

 - Subtraction. If the right argument is absolute, the result has the
   section of the left argument. If both arguments are in the same
   section, the result is absolute. You may not subtract arguments from
   different sections.

Funnily this does not result in a compile/assemble time error :-(, it
seems to emit a MOP5 but then at runtime explodes because the actual
displacement (after linking etc..) ends up fitting in a d8 and then the
actual code and the expected code don't match up at code patching time
and we BUG.

If I were to be able to reliably detect this section mismatch I could
encode it in the JUMP_TABLE_ENTRY (__jump_table section).

Any clues on how I can (best) fix this; even if it involves writing a
GAS patch that'd be fine, we can have this functionality depend on a
binutils version.

Thanks!

  ~ Peter Zijlstra


Full patch series here:

  https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/log/?h=locking/jump_label


---
Subject: jump_label, x86: Enable JMP8/NOP2 support
From: Peter Zijlstra <peterz@infradead.org>

Date: Fri Jun 28 10:20:55 CEST 2019

Enable and emit short JMP/NOP jump_label entries.

Much thanks to Josh for (re)discovering the .skip trick to
conditionally emit variable length text.

Due to how early we enable jump_labels on x86, if any of this comes
apart, the machine is completely dead. Qemu+GDB saved the day this
time.

Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>

---
 arch/x86/include/asm/jump_label.h |   37 +++++++++++++++++++++++++++++++------
 arch/x86/kernel/jump_label.c      |    5 ++++-
 2 files changed, 35 insertions(+), 7 deletions(-)

Comments

H.J. Lu July 15, 2019, 8:10 p.m. | #1
On Mon, Jul 15, 2019 at 7:48 AM Peter Zijlstra <peterz@infradead.org> wrote:
>

> Hi all,

>

> I'm trying to 'optimize' the Linux Kernel x86 jump_label support. That

> is; jump_label or static_branch() is a Linux Kernel construct build upon

> GCC asm goto and provides for self-modifying code based branches.

> Regular execution will only see unconditional branches or nops.

>

> Currently, on x86, we patch between jmp.d32 or nop5 and this works well.

>

> The quest is to also allow usage of jmp.d8 (and the matching nop2) where

> the displacement allows for this.

>

> The below patch is the last of a series that implements this and

> contains all the relevant bits to this discussion, and is subtly broken.

>

> The problem is that the labels GCC hands to the asm goto () can be in

> different sections (namely .text and .text.unlikely), and the GAS manual

> sayeth:

>

>  [ https://sourceware.org/binutils/docs-2.27/as/Infix-Ops.html#Infix-Ops ]

>

>  - Subtraction. If the right argument is absolute, the result has the

>    section of the left argument. If both arguments are in the same

>    section, the result is absolute. You may not subtract arguments from

>    different sections.

>

> Funnily this does not result in a compile/assemble time error :-(, it

> seems to emit a MOP5 but then at runtime explodes because the actual

> displacement (after linking etc..) ends up fitting in a d8 and then the

> actual code and the expected code don't match up at code patching time

> and we BUG.

>

> If I were to be able to reliably detect this section mismatch I could

> encode it in the JUMP_TABLE_ENTRY (__jump_table section).

>

> Any clues on how I can (best) fix this; even if it involves writing a

> GAS patch that'd be fine, we can have this functionality depend on a

> binutils version.

>


.d8 is only a hint.  Is that possible to use the new ".nops SIZE" directive
where SIZE can be an expression.

-- 
H.J.
Peter Zijlstra July 16, 2019, 8:43 a.m. | #2
On Mon, Jul 15, 2019 at 01:10:42PM -0700, H.J. Lu wrote:
> On Mon, Jul 15, 2019 at 7:48 AM Peter Zijlstra <peterz@infradead.org> wrote:

> >

> > Hi all,

> >

> > I'm trying to 'optimize' the Linux Kernel x86 jump_label support. That

> > is; jump_label or static_branch() is a Linux Kernel construct build upon

> > GCC asm goto and provides for self-modifying code based branches.

> > Regular execution will only see unconditional branches or nops.

> >

> > Currently, on x86, we patch between jmp.d32 or nop5 and this works well.

> >

> > The quest is to also allow usage of jmp.d8 (and the matching nop2) where

> > the displacement allows for this.

> >

> > The below patch is the last of a series that implements this and

> > contains all the relevant bits to this discussion, and is subtly broken.

> >

> > The problem is that the labels GCC hands to the asm goto () can be in

> > different sections (namely .text and .text.unlikely), and the GAS manual

> > sayeth:

> >

> >  [ https://sourceware.org/binutils/docs-2.27/as/Infix-Ops.html#Infix-Ops ]

> >

> >  - Subtraction. If the right argument is absolute, the result has the

> >    section of the left argument. If both arguments are in the same

> >    section, the result is absolute. You may not subtract arguments from

> >    different sections.

> >

> > Funnily this does not result in a compile/assemble time error :-(, it

> > seems to emit a MOP5 but then at runtime explodes because the actual

> > displacement (after linking etc..) ends up fitting in a d8 and then the

> > actual code and the expected code don't match up at code patching time

> > and we BUG.

> >

> > If I were to be able to reliably detect this section mismatch I could

> > encode it in the JUMP_TABLE_ENTRY (__jump_table section).

> >

> > Any clues on how I can (best) fix this; even if it involves writing a

> > GAS patch that'd be fine, we can have this functionality depend on a

> > binutils version.

> >

> 

> .d8 is only a hint.  Is that possible to use the new ".nops SIZE" directive

> where SIZE can be an expression.


The problem appears to be constructing an expression that yields the
exact same semantics as jmp. Given that GCC might provide us with a
label into another section, we cannot (per the above as documentation)
compute a displacement. Or ever detect this case.

Also, is there any control over what NOPs are emitted with .nops ? There
is a hard requirement the nop5 and nop2 are single instructions.
Peter Zijlstra July 16, 2019, 10:07 a.m. | #3
On Tue, Jul 16, 2019 at 10:43:27AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2019 at 01:10:42PM -0700, H.J. Lu wrote:

> > On Mon, Jul 15, 2019 at 7:48 AM Peter Zijlstra <peterz@infradead.org> wrote:

> > >

> > > Hi all,

> > >

> > > I'm trying to 'optimize' the Linux Kernel x86 jump_label support. That

> > > is; jump_label or static_branch() is a Linux Kernel construct build upon

> > > GCC asm goto and provides for self-modifying code based branches.

> > > Regular execution will only see unconditional branches or nops.

> > >

> > > Currently, on x86, we patch between jmp.d32 or nop5 and this works well.

> > >

> > > The quest is to also allow usage of jmp.d8 (and the matching nop2) where

> > > the displacement allows for this.

> > >

> > > The below patch is the last of a series that implements this and

> > > contains all the relevant bits to this discussion, and is subtly broken.

> > >

> > > The problem is that the labels GCC hands to the asm goto () can be in

> > > different sections (namely .text and .text.unlikely), and the GAS manual

> > > sayeth:

> > >

> > >  [ https://sourceware.org/binutils/docs-2.27/as/Infix-Ops.html#Infix-Ops ]

> > >

> > >  - Subtraction. If the right argument is absolute, the result has the

> > >    section of the left argument. If both arguments are in the same

> > >    section, the result is absolute. You may not subtract arguments from

> > >    different sections.

> > >

> > > Funnily this does not result in a compile/assemble time error :-(, it

> > > seems to emit a MOP5 but then at runtime explodes because the actual

> > > displacement (after linking etc..) ends up fitting in a d8 and then the

> > > actual code and the expected code don't match up at code patching time

> > > and we BUG.

> > >

> > > If I were to be able to reliably detect this section mismatch I could

> > > encode it in the JUMP_TABLE_ENTRY (__jump_table section).

> > >

> > > Any clues on how I can (best) fix this; even if it involves writing a

> > > GAS patch that'd be fine, we can have this functionality depend on a

> > > binutils version.

> > >

> > 

> > .d8 is only a hint.  Is that possible to use the new ".nops SIZE" directive

> > where SIZE can be an expression.

> 

> The problem appears to be constructing an expression that yields the

> exact same semantics as jmp. Given that GCC might provide us with a

> label into another section, we cannot (per the above as documentation)

> compute a displacement. Or ever detect this case.


Also, 'funnily' when you add:

	".long disp"

to emit the calculated displacement, you do get an assembly error, but
the (indirect) usage in .skip doesn't trigger this.

This also inhibits emitting the actual jmp instruction the same way we
emit the nop case, which greatly complicates storing the size in our
jump entry table.

With this patchlet on top of the tree I pointed to earlier:

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 663ec7a1f19f..21e0b74d8d5f 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -37,6 +37,9 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 		".set is_byte, -res \n\t"
 		".set is_long, -(~res) \n\t"
 
+		".byte 0xe9\n\t"
+		".long disp - 3\n\t"
+
 #ifdef CONFIG_X86_64
 		".skip is_byte, 0x66 \n\t"
 		".skip is_byte, 0x90 \n\t"

(using x86_64-defconfig, gcc (Debian 8.3.0-6) 8.3.0)

$ make O=defconfig-build/ drivers/usb/host/xhci.o
...
/tmp/user/0/cc4sG7aW.s: Assembler messages:
/tmp/user/0/cc4sG7aW.s: Error: invalid operands (.text.unlikely and .text sections) for `-' when setting `disp'

But without that patchlet on top it builds just fine:


Relocation section '.rela__jump_table' at offset 0xec40 contains 63 entries:
Offset          Info           Type           Sym. Value    Sym. Name + Addend
...
000000000130  000200000002 R_X86_64_PC32     0000000000000000 .text + 6284
000000000134  001b00000002 R_X86_64_PC32     0000000000000000 .text.unlikely + b6f
000000000138  013900000018 R_X86_64_PC64     0000000000000000 __tracepoint_xhci_addr + 8
...


.text

0000000000005cb0 <xhci_setup_device>:
...
6284:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1)
6289:       e9 00 00 00 00          jmpq   628e <xhci_setup_device+0x5de>
		628a: R_X86_64_PC32     .text.unlikely+0xb8d


.text.unlikely

0000000000000b0f <xhci_setup_device.cold.98>:
...
 b6f:   4d 8b 65 08             mov    0x8(%r13),%r12
 b73:   65 8b 05 00 00 00 00    mov    %gs:0x0(%rip),%eax        # b7a <xhci_setup_device.cold.98+0x6b>
		 b76: R_X86_64_PC32      cpu_number-0x4


So even though:

	".set disp, %l[l_yes] - (1b + 2) \n\t"
	".set res, (disp >> 31) == (disp >> 7) \n\t"
	".set is_byte, -res \n\t"
	".set is_long, -(~res) \n\t"

is strictly undefined behaviour per the as documentation, we end up
with is_byte=0 and is_long=1 and emit:

	".skip is_long, 0x0f \n\t"
	".skip is_long, 0x1f \n\t"
	".skip is_long, 0x44 \n\t"
	".skip is_long, 0x00 \n\t"
	".skip is_long, 0x00 \n\t"

as observed in the objdump output, but we cannot, per the assmebler
error earlier, use 'disp'.

Colour me confused and frustrated.


Anyway, this seems to actually work:

	".nops (2*is_byte) + (5*is_long)\n\t"

and for the single case (above) I checked it emits the same nop5.
Peter Zijlstra July 16, 2019, 10:56 a.m. | #4
On Tue, Jul 16, 2019 at 12:07:10PM +0200, Peter Zijlstra wrote:

> Also, 'funnily' when you add:

> 

> 	".long disp"

> 

> to emit the calculated displacement, you do get an assembly error, but

> the (indirect) usage in .skip doesn't trigger this.


So given that when we cross sections, is_long=1 for as yet unspecified
raisins, I figured I'd try and store that in our table and use it as a
size override, but that got me assmeble errors *again* :/


/tmp/ccjvEgTu.s: Assembler messages:
/tmp/ccjvEgTu.s: Error: invalid operands (.text.unlikely and .text sections) for `-' when setting `disp'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
/tmp/ccjvEgTu.s: Error: invalid operands (.text.unlikely and .text sections) for `-' when setting `disp'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
make[3]: *** [../scripts/Makefile.build:279: arch/x86/kvm/vmx/nested.o] Error 1
make[3]: *** Waiting for unfinished jobs....
../arch/x86/include/asm/jump_label.h: Assembler messages:
../arch/x86/include/asm/jump_label.h:37: Warning: .space repeat count is zero, ignored
../arch/x86/include/asm/jump_label.h:38: Warning: .space repeat count is zero, ignored
../arch/x86/include/asm/jump_label.h:37: Warning: .space repeat count is zero, ignored
../arch/x86/include/asm/jump_label.h:38: Warning: .space repeat count is zero, ignored
/tmp/ccmK11Jw.s: Error: invalid operands (.text.unlikely and .text sections) for `-' when setting `disp'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
/tmp/ccmK11Jw.s: Error: invalid operands (.text.unlikely and .text sections) for `-' when setting `disp'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
/tmp/ccmK11Jw.s: Error: invalid operands (.text.unlikely and .text sections) for `-' when setting `disp'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
../arch/x86/include/asm/jump_label.h:34: Error: invalid operands (.text.unlikely and *ABS* sections) for `>>'
make[3]: *** [../scripts/Makefile.build:279: arch/x86/kvm/vmx/vmx.o] Error 1
make[2]: *** [../scripts/Makefile.build:489: arch/x86/kvm] Error 2
make[1]: *** [/usr/src/linux-2.6/Makefile:1071: arch/x86] Error 2
make: *** [Makefile:179: sub-make] Error 2


---
diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 663ec7a1f19f..ecff560958e2 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -25,7 +25,7 @@
 	_ASM_ALIGN "\n\t"				\
 	".long 1b - . \n\t"				\
 	".long %l[l_yes] - . \n\t"			\
-	_ASM_PTR "%c0 + %c1 - .\n\t"			\
+	_ASM_PTR "%c0 + %c1 + (4*is_long) - .\n\t"	\
 	".popsection \n\t"
 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
@@ -70,7 +70,14 @@ static __always_inline bool arch_static_branch(struct static_key *key, bool bran
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
 	asm_volatile_goto("1:"
+
+		".set disp, %l[l_yes] - (1b + 2) \n\t"
+		".set res, (disp >> 31) == (disp >> 7) \n\t"
+		".set is_byte, -res \n\t"
+		".set is_long, -(~res) \n\t"
+
 		"jmp %l[l_yes] \n\t"
+
 		JUMP_TABLE_ENTRY
 		: :  "i" (key), "i" (branch) : : l_yes);
 
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 22379f62f0f7..9aba802e593f 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -36,7 +36,7 @@ static inline bool __jump_disp_is_byte(s32 disp)
 int arch_jump_entry_size(struct jump_entry *entry)
 {
 	s32 disp = jump_entry_target(entry) - jump_entry_code(entry);
-	if (__jump_disp_is_byte(disp))
+	if (!jump_entry_is_arch(entry) && __jump_disp_is_byte(disp))
 		return JMP8_INSN_SIZE;
 	return JMP32_INSN_SIZE;
 }
diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 5cafdb7c3e9c..e97e40a3bb81 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -98,13 +98,14 @@ struct static_key {
  *	    0 if initially false
  * bit 1 => 1 if points to struct static_key_mod
  *	    0 if points to struct jump_entry
+ * bit 2 -- arch defined
  */
 	union {
 		unsigned long type;
 		struct jump_entry *entries;
 		struct static_key_mod *next;
 	};
-};
+} __aligned(1<<3);
 
 #else
 struct static_key {
@@ -137,7 +138,7 @@ static inline unsigned long jump_entry_target(const struct jump_entry *entry)
 
 static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
 {
-	long offset = entry->key & ~3L;
+	long offset = entry->key & ~7L;
 
 	return (struct static_key *)((unsigned long)&entry->key + offset);
 }
@@ -156,7 +157,7 @@ static inline unsigned long jump_entry_target(const struct jump_entry *entry)
 
 static inline struct static_key *jump_entry_key(const struct jump_entry *entry)
 {
-	return (struct static_key *)((unsigned long)entry->key & ~3UL);
+	return (struct static_key *)((unsigned long)entry->key & ~7UL);
 }
 
 #endif
@@ -171,6 +172,11 @@ static inline bool jump_entry_is_init(const struct jump_entry *entry)
 	return (unsigned long)entry->key & 2UL;
 }
 
+static inline bool jump_entry_is_arch(const struct jump_entry *entry)
+{
+	return (unsigned long)entry->key & 4UL;
+}
+
 static inline void jump_entry_set_init(struct jump_entry *entry)
 {
 	entry->key |= 2;
Peter Zijlstra July 16, 2019, 1:43 p.m. | #5
On Tue, Jul 16, 2019 at 12:56:37PM +0200, Peter Zijlstra wrote:
> ../arch/x86/include/asm/jump_label.h: Assembler messages:

> ../arch/x86/include/asm/jump_label.h:37: Warning: .space repeat count is zero, ignored

> ../arch/x86/include/asm/jump_label.h:38: Warning: .space repeat count is zero, ignored

> ../arch/x86/include/asm/jump_label.h:37: Warning: .space repeat count is zero, ignored

> ../arch/x86/include/asm/jump_label.h:38: Warning: .space repeat count is zero, ignored


This one is another puzzle, it comes from:

arch/x86/kvm/vmx/vmx.o (which for some reason we cannot build directly,
and requires CONFIG_KVM_INTEL=y which is not part of x86_64-defconfig).

But I cannot quite figure out where/why.

$ readelf -a defconfig-build/arch/x86/kvm/vmx/vmx.o |
  awk '/^Relocation section/ { p=0 }
       /^Relocation section.*\.rela__jump_table/ { p=1; getline; getline }
       { if (!p) { next }
         if ($0 ~ /^$/) { p=0; next }
	 A=$0; a=$5; getline;
	 B=$0; b=$5; getline;
	 if (a != b) { print A; print B; print $0; }
	 }'

000000000100  000200000002 R_X86_64_PC32     0000000000000000 .text + 670
000000000104  000800000002 R_X86_64_PC32     0000000000000000 .text.unlikely + 1b
000000000108  016100000018 R_X86_64_PC64     0000000000000000 enable_evmcs + 0
000000001940  000200000002 R_X86_64_PC32     0000000000000000 .text + b688
000000001944  000800000002 R_X86_64_PC32     0000000000000000 .text.unlikely + 991
000000001948  016100000018 R_X86_64_PC64     0000000000000000 enable_evmcs + 0
000000001950  000200000002 R_X86_64_PC32     0000000000000000 .text + b6a1
000000001954  000800000002 R_X86_64_PC32     0000000000000000 .text.unlikely + 26b
000000001958  016100000018 R_X86_64_PC64     0000000000000000 enable_evmcs + 0

Which is _3_ cross-section jump entries, but there's only 2 warnings.

It does go away with:

	".nops (2*is_byte) + (5*is_long) \n\t"

though, and again, all 3 turn into nop5, even though this is UB.
Peter Zijlstra July 18, 2019, 11:28 a.m. | #6
On Tue, Jul 16, 2019 at 10:43:27AM +0200, Peter Zijlstra wrote:
> On Mon, Jul 15, 2019 at 01:10:42PM -0700, H.J. Lu wrote:


> > .d8 is only a hint.  Is that possible to use the new ".nops SIZE" directive

> > where SIZE can be an expression.

> 

> The problem appears to be constructing an expression that yields the

> exact same semantics as jmp. Given that GCC might provide us with a

> label into another section, we cannot (per the above as documentation)

> compute a displacement. Or ever detect this case.


Would it be possible to implement a new mnemonic "nopjmp" that acts the
exact same as "jmp" but emits either nop2/nop5 instead of actual jumps?

Then I'll need to change the kernel to read the instruction to determine
size, but at least the nops and jmps would be consistently sized.
H.J. Lu July 18, 2019, 8:39 p.m. | #7
On Thu, Jul 18, 2019 at 4:28 AM Peter Zijlstra <peterz@infradead.org> wrote:
>

> On Tue, Jul 16, 2019 at 10:43:27AM +0200, Peter Zijlstra wrote:

> > On Mon, Jul 15, 2019 at 01:10:42PM -0700, H.J. Lu wrote:

>

> > > .d8 is only a hint.  Is that possible to use the new ".nops SIZE" directive

> > > where SIZE can be an expression.

> >

> > The problem appears to be constructing an expression that yields the

> > exact same semantics as jmp. Given that GCC might provide us with a

> > label into another section, we cannot (per the above as documentation)

> > compute a displacement. Or ever detect this case.

>

> Would it be possible to implement a new mnemonic "nopjmp" that acts the

> exact same as "jmp" but emits either nop2/nop5 instead of actual jumps?

>

> Then I'll need to change the kernel to read the instruction to determine

> size, but at least the nops and jmps would be consistently sized.


How much reduction in kernel size does it provide?  Do you have a small real
usage example in assembly code?

-- 
H.J.
Peter Zijlstra July 19, 2019, 7:57 a.m. | #8
On Thu, Jul 18, 2019 at 01:39:12PM -0700, H.J. Lu wrote:
> On Thu, Jul 18, 2019 at 4:28 AM Peter Zijlstra <peterz@infradead.org> wrote:

> >

> > On Tue, Jul 16, 2019 at 10:43:27AM +0200, Peter Zijlstra wrote:

> > > On Mon, Jul 15, 2019 at 01:10:42PM -0700, H.J. Lu wrote:

> >

> > > > .d8 is only a hint.  Is that possible to use the new ".nops SIZE" directive

> > > > where SIZE can be an expression.

> > >

> > > The problem appears to be constructing an expression that yields the

> > > exact same semantics as jmp. Given that GCC might provide us with a

> > > label into another section, we cannot (per the above as documentation)

> > > compute a displacement. Or ever detect this case.

> >

> > Would it be possible to implement a new mnemonic "nopjmp" that acts the

> > exact same as "jmp" but emits either nop2/nop5 instead of actual jumps?

> >

> > Then I'll need to change the kernel to read the instruction to determine

> > size, but at least the nops and jmps would be consistently sized.

> 

> How much reduction in kernel size does it provide?  Do you have a small real

> usage example in assembly code?


Drops a x86_64-defconfig build by about 6k. Let me go find a simple
case.

Patch

--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -31,7 +31,34 @@ 
 static __always_inline bool arch_static_branch(struct static_key *key, bool branch)
 {
 	asm_volatile_goto("1:"
-		".byte " __stringify(STATIC_KEY_INIT_NOP) "\n\t"
+
+		".set disp, %l[l_yes] - (1b + 2) \n\t"
+		".set res, (disp >> 31) == (disp >> 7) \n\t"
+		".set is_byte, -res \n\t"
+		".set is_long, -(~res) \n\t"
+
+#ifdef CONFIG_X86_64
+		".skip is_byte, 0x66 \n\t"
+		".skip is_byte, 0x90 \n\t"
+#else
+		".skip is_byte, 0x89 \n\t"
+		".skip is_byte, 0xf6 \n\t"
+#endif
+
+#ifdef CONFIG_X86_64
+		".skip is_long, 0x0f \n\t"
+		".skip is_long, 0x1f \n\t"
+		".skip is_long, 0x44 \n\t"
+		".skip is_long, 0x00 \n\t"
+		".skip is_long, 0x00 \n\t"
+#else
+		".skip is_long, 0x3e \n\t"
+		".skip is_long, 0x8d \n\t"
+		".skip is_long, 0x74 \n\t"
+		".skip is_long, 0x26 \n\t"
+		".skip is_long, 0x00 \n\t"
+#endif
+
 		JUMP_TABLE_ENTRY
 		: :  "i" (key), "i" (branch) : : l_yes);
 
@@ -43,8 +71,7 @@  static __always_inline bool arch_static_
 static __always_inline bool arch_static_branch_jump(struct static_key *key, bool branch)
 {
 	asm_volatile_goto("1:"
-		".byte 0xe9 \n\t"
-		".long %l[l_yes] - (. + 4) \n\t"
+		"jmp %l[l_yes] \n\t"
 		JUMP_TABLE_ENTRY
 		: :  "i" (key), "i" (branch) : : l_yes);
 
@@ -59,9 +86,7 @@  extern int arch_jump_entry_size(struct j
 
 .macro STATIC_BRANCH_FALSE_LIKELY target, key
 .Lstatic_jump_\@:
-	/* Equivalent to "jmp.d32 \target" */
-	.byte		0xe9
-	.long		\target - (. + 4)
+	jmp \target
 
 	.pushsection __jump_table, "aw"
 	_ASM_ALIGN
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -29,7 +29,8 @@  union jump_code_union {
 
 static inline bool __jump_disp_is_byte(s32 disp)
 {
-	return false;
+	disp -= JMP8_INSN_SIZE;
+	return (disp >> 31) == (disp >> 7);
 }
 
 int arch_jump_entry_size(struct jump_entry *entry)