[GCN] Fix handling of VCC_CONDITIONAL_REG

Message ID 8acdba25-576b-19d6-71b8-d6ca84ec675c@codesourcery.com
State New
Headers show
Series
  • [GCN] Fix handling of VCC_CONDITIONAL_REG
Related show

Commit Message

Kwok Cheung Yeung Nov. 14, 2019, 12:43 p.m.
Hello

This patch fixes an issue seen in the following test cases on AMD GCN:

libgomp.oacc-fortran/gemm.f90
libgomp.oacc-fortran/gemm-2.f90
libgomp.c/for-5-test_ttdpfs_ds128_auto.c
libgomp.c/for-5-test_ttdpfs_ds128_guided32.c
libgomp.c/for-5-test_ttdpfs_ds128_runtime.c
libgomp.c/for-5-test_ttdpfs_ds128_static.c
libgomp.c/for-5-test_ttdpfs_ds128_static32.c
libgomp.c/for-6-test_tdpfs_ds128_auto.c
libgomp.c/for-6-test_tdpfs_ds128_guided32.c
libgomp.c/for-6-test_tdpfs_ds128_runtime.c
libgomp.c/for-6-test_tdpfs_ds128_static.c
libgomp.c/for-6-test_tdpfs_ds128_static32.c
libgomp.c-c++-common/for-5.c
libgomp.c-c++-common/for-6.c

The compiler is generating code like this:

     v_cmp_gt_i32    vcc, s3, v1
     v_writelane_b32    v3, vcc_lo, 0    ; Move VCC to v[3:4]
     v_writelane_b32    v4, vcc_hi, 0
     ...
     v_cmp_eq_f32    vcc, v20, v0    ; Clobber VCC
     ...
     ; Move old VCC_LO into s62 - this is okay as v3 has not
     ; been clobbered.
     v_readlane_b32    s62, v3, 0

     ; Move old VCC_HI into s63 - this is _not_ okay as VCC_HI
     ; has been clobbered by the intervening v_cmp_eq_f32 instruction.
     s_mov_b32    s63, vcc_hi

GCC is failing to notice that the v_cmp_eq_f32 instruction clobbers both 
vcc_lo and vcc_hi. This is because gcn_hard_regno_nregs uses 
REGNO_REG_CLASS to determine the register class of the queried hard reg, 
but vcc_lo and vcc_hi are classified as GENERAL_REGS by 
gcn_regno_reg_class. gcn_hard_regno_nregs therefore returns 1 for the 
BImode store into vcc, so vcc_hi is regarded as untouched by the operation.

This is fixed by making gcn_regno_reg_class classify vcc_lo and vcc_hi 
into VCC_CONDITIONAL_REG (REGNO_REG_CLASS is supposed to return the 
smallest class anyway). I have also added a spill class for 
VCC_CONDITIONAL_REG (into SGPR_REGS) to avoid expensive spills into memory.

Built for and tested on the AMD GCN target with no regressions.

Okay for trunk?

Kwok


2019-11-14  Kwok Cheung Yeung  <kcy@codesourcery.com>

	gcc/
	* config/gcn/gcn.c (gcn_regno_reg_class): Return VCC_CONDITIONAL_REG
	register class for VCC_LO and VCC_HI.
	(gcn_spill_class): Use SGPR_REGS to spill registers in
	VCC_CONDITIONAL_REG.
---
  gcc/config/gcn/gcn.c | 6 +++++-
  1 file changed, 5 insertions(+), 1 deletion(-)

-- 
2.8.1

Comments

Andrew Stubbs Nov. 14, 2019, 1:18 p.m. | #1
On 14/11/2019 12:43, Kwok Cheung Yeung wrote:
> Hello

> 

> This patch fixes an issue seen in the following test cases on AMD GCN:

> 

> libgomp.oacc-fortran/gemm.f90

> libgomp.oacc-fortran/gemm-2.f90

> libgomp.c/for-5-test_ttdpfs_ds128_auto.c

> libgomp.c/for-5-test_ttdpfs_ds128_guided32.c

> libgomp.c/for-5-test_ttdpfs_ds128_runtime.c

> libgomp.c/for-5-test_ttdpfs_ds128_static.c

> libgomp.c/for-5-test_ttdpfs_ds128_static32.c

> libgomp.c/for-6-test_tdpfs_ds128_auto.c

> libgomp.c/for-6-test_tdpfs_ds128_guided32.c

> libgomp.c/for-6-test_tdpfs_ds128_runtime.c

> libgomp.c/for-6-test_tdpfs_ds128_static.c

> libgomp.c/for-6-test_tdpfs_ds128_static32.c

> libgomp.c-c++-common/for-5.c

> libgomp.c-c++-common/for-6.c

> 

> The compiler is generating code like this:

> 

>      v_cmp_gt_i32    vcc, s3, v1

>      v_writelane_b32    v3, vcc_lo, 0    ; Move VCC to v[3:4]

>      v_writelane_b32    v4, vcc_hi, 0

>      ...

>      v_cmp_eq_f32    vcc, v20, v0    ; Clobber VCC

>      ...

>      ; Move old VCC_LO into s62 - this is okay as v3 has not

>      ; been clobbered.

>      v_readlane_b32    s62, v3, 0

> 

>      ; Move old VCC_HI into s63 - this is _not_ okay as VCC_HI

>      ; has been clobbered by the intervening v_cmp_eq_f32 instruction.

>      s_mov_b32    s63, vcc_hi

> 

> GCC is failing to notice that the v_cmp_eq_f32 instruction clobbers both 

> vcc_lo and vcc_hi. This is because gcn_hard_regno_nregs uses 

> REGNO_REG_CLASS to determine the register class of the queried hard reg, 

> but vcc_lo and vcc_hi are classified as GENERAL_REGS by 

> gcn_regno_reg_class. gcn_hard_regno_nregs therefore returns 1 for the 

> BImode store into vcc, so vcc_hi is regarded as untouched by the operation.

> 

> This is fixed by making gcn_regno_reg_class classify vcc_lo and vcc_hi 

> into VCC_CONDITIONAL_REG (REGNO_REG_CLASS is supposed to return the 

> smallest class anyway). I have also added a spill class for 

> VCC_CONDITIONAL_REG (into SGPR_REGS) to avoid expensive spills into memory.

> 

> Built for and tested on the AMD GCN target with no regressions.

> 

> Okay for trunk?

> 

> Kwok

> 

> 

> 2019-11-14  Kwok Cheung Yeung  <kcy@codesourcery.com>

> 

>      gcc/

>      * config/gcn/gcn.c (gcn_regno_reg_class): Return VCC_CONDITIONAL_REG

>      register class for VCC_LO and VCC_HI.

>      (gcn_spill_class): Use SGPR_REGS to spill registers in

>      VCC_CONDITIONAL_REG.

> ---

>   gcc/config/gcn/gcn.c | 6 +++++-

>   1 file changed, 5 insertions(+), 1 deletion(-)

> 

> diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c

> index 976843b..dd89c26 100644

> --- a/gcc/config/gcn/gcn.c

> +++ b/gcc/config/gcn/gcn.c

> @@ -470,6 +470,9 @@ gcn_regno_reg_class (int regno)

>       {

>       case SCC_REG:

>         return SCC_CONDITIONAL_REG;

> +    case VCC_LO_REG:

> +    case VCC_HI_REG:

> +      return VCC_CONDITIONAL_REG;

>       case VCCZ_REG:

>         return VCCZ_CONDITIONAL_REG;

>       case EXECZ_REG:

> @@ -637,7 +640,8 @@ gcn_can_split_p (machine_mode, rtx op)

>   static reg_class_t

>   gcn_spill_class (reg_class_t c, machine_mode /*mode */ )

>   {

> -  if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c))

> +  if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c)

> +      || c == VCC_CONDITIONAL_REG)

>       return SGPR_REGS;

>     else

>       return NO_REGS;


OK.

Andrew

Patch

diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index 976843b..dd89c26 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -470,6 +470,9 @@  gcn_regno_reg_class (int regno)
      {
      case SCC_REG:
        return SCC_CONDITIONAL_REG;
+    case VCC_LO_REG:
+    case VCC_HI_REG:
+      return VCC_CONDITIONAL_REG;
      case VCCZ_REG:
        return VCCZ_CONDITIONAL_REG;
      case EXECZ_REG:
@@ -637,7 +640,8 @@  gcn_can_split_p (machine_mode, rtx op)
  static reg_class_t
  gcn_spill_class (reg_class_t c, machine_mode /*mode */ )
  {
-  if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c))
+  if (reg_classes_intersect_p (ALL_CONDITIONAL_REGS, c)
+      || c == VCC_CONDITIONAL_REG)
      return SGPR_REGS;
    else
      return NO_REGS;