Message ID  5B4F21E0.3060307@foss.arm.com 

State  New 
Headers  show 
Series 

Related  show 
Hi Kyrlll, > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>: > > Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits > mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test? I would be a bit careful about changing behavior in such a major way. What would the results with NaN and infinity then be, with or without optimization? Would the results be consistent with min(nan,num) vs min(num,nan)? Would they be consistent with the new IEEE standard? In general, I think that min(nan,num) should be nan and that our current behavior is not the best. Does anybody have dats points on how this is handled by other compilers? Oh, and if anything is changed, then compile and runtime behavior should always be the same. Regards, Thomas
On 18/07/18 14:26, Thomas König wrote: > Hi Kyrlll, > >> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov <kyrylo.tkachov@foss.arm.com>: >> >> Thomas, Janne, would this relaxation of NaN handling be acceptable given the benefits >> mentioned above? If so, what would be the recommended adjustment to the nan_1.f90 test? > I would be a bit careful about changing behavior in such a major way. What would the results with NaN and infinity then be, with or without optimization? Would the results be consistent with min(nan,num) vs min(num,nan)? Would they be consistent with the new IEEE standard? > > In general, I think that min(nan,num) should be nan and that our current behavior is not the best. > > Does anybody have dats points on how this is handled by other compilers? > > Oh, and if anything is changed, then compile and runtime behavior should always be the same. Thanks, that makes it clearer what behaviour is accceptable. So this v3 patch follows Richard Sandiford's suggested approach of emitting IFN_FMIN/FMAX when dealing with floatingpoint values and NaN handling is important and the target supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison sequence is emitted. For integer types and ffastmath floatingpoint it will emit MIN/MAX_EXPR. With this patch the nan_1.f90 behaviour is preserved on all targets, we get the optimal sequence on aarch64 and on x86_64 we avoid the function call, with no changes in code generation. This gives the performance improvement on 521.wrf on aarch64 and leaves it unchanged on x86_64. I'm hoping this addresses all the concerns raised in this thread: * The NaNhandling behaviour is unchanged on all platforms. * The fast inline sequence is emitted where it is available. * No calls to library fmin*/fmax* are emitted where there were none. * MIN/MAX_EXPR sequence are emitted where possible. Is this acceptable? Thanks, Kyrill 20180718 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * transintrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR or IFN_FMIN/FMAX sequence to calculate the min/max when possible. 20180718 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gfortran.dg/max_fmaxl_aarch64.f90: New test. * gfortran.dg/min_fminl_aarch64.f90: Likewise. * gfortran.dg/minmax_integer.f90: Likewise. diff git a/gcc/fortran/transintrinsic.c b/gcc/fortran/transintrinsic.c index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..6f5700f2a421d2a735d77c4c4ec0c4c9c058e727 100644  a/gcc/fortran/transintrinsic.c +++ b/gcc/fortran/transintrinsic.c @@ 31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include "trans.h" #include "stringpool.h" #include "foldconst.h" +#include "internalfn.h" #include "treenested.h" #include "storlayout.h" #include "toplev.h" /* For rest_of_decl_compilation. */ @@ 3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr) minmax (a1, a2, a3, ...) { mvar = a1;  if (a2 .op. mvar  isnan (mvar))  mvar = a2;  if (a3 .op. mvar  isnan (mvar))  mvar = a3; + mvar = COMP (mvar, a2) + mvar = COMP (mvar, a3) ...  return mvar + return mvar; }  */ + Where COMP is MIN/MAX_EXPR for integral types or when we don't + care about NaNs, or IFN_FMIN/MAX when the target has support for + fast NaNhonouring min/max. When neither holds expand a sequence + of explicit comparisons. */ /* TODO: Mismatching types can occur when specific names are used. These should be handled during resolution. */ @@ 3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) tree tmp; tree mvar; tree val;  tree thencase; tree *args; tree type; gfc_actual_arglist *argexpr; @@ 3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) mvar = gfc_create_var (type, "M"); gfc_add_modify (&se>pre, mvar, args[0]);  for (i = 1, argexpr = argexpr>next; i < nargs; i++)  {  tree cond, isnan; + internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN; + + for (i = 1, argexpr = argexpr>next; i < nargs; i++, argexpr = argexpr>next) + { + tree cond = NULL_TREE; val = args[i]; /* Handle absent optional arguments by ignoring the comparison. */ if (argexpr>expr>expr_type == EXPR_VARIABLE && argexpr>expr>symtree>n.sym>attr.optional && TREE_CODE (val) == INDIRECT_REF)  cond = fold_build2_loc (input_location, + { + cond = fold_build2_loc (input_location, NE_EXPR, logical_type_node, TREE_OPERAND (val, 0), build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));  else  {  cond = NULL_TREE;  + } + else if (!VAR_P (val) && !TREE_CONSTANT (val)) /* Only evaluate the argument once. */  if (!VAR_P (val) && !TREE_CONSTANT (val))  val = gfc_evaluate_now (val, &se>pre);  } + val = gfc_evaluate_now (val, &se>pre);  thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val)); + tree calc; + /* If we dealing with integral types or we don't care about NaNs + just do a MIN/MAX_EXPR. */ + if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)) + { + + tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; + calc = fold_build2_loc (input_location, code, type, + convert (type, val), mvar); + tmp = build2_v (MODIFY_EXPR, mvar, calc);  tmp = fold_build2_loc (input_location, op, logical_type_node,  convert (type, val), mvar); + } + /* If we care about NaNs and we have internal functions available for + fmin/fmax to perform the comparison, use those. */ + else if (SCALAR_FLOAT_TYPE_P (type) + && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED)) + { + calc = build_call_expr_internal_loc (input_location, ifn, type, + 2, mvar, convert (type, val)); + tmp = build2_v (MODIFY_EXPR, mvar, calc);  /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to  __builtin_isnan might be made dependent on that module being loaded,  to help performance of programs that don't rely on IEEE semantics. */  if (FLOAT_TYPE_P (TREE_TYPE (mvar))) + } + /* Otherwise expand to: + mvar = a1; + if (a2 .op. mvar  isnan (mvar)) + mvar = a2; + if (a3 .op. mvar  isnan (mvar)) + mvar = a3; + ... */ + else {  isnan = build_call_expr_loc (input_location,  builtin_decl_explicit (BUILT_IN_ISNAN),  1, mvar); + tree isnan = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_ISNAN), + 1, mvar); + tmp = fold_build2_loc (input_location, op, logical_type_node, + convert (type, val), mvar); + tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,  logical_type_node, tmp,  fold_convert (logical_type_node, isnan)); + logical_type_node, tmp, + fold_convert (logical_type_node, isnan)); + tmp = build3_v (COND_EXPR, tmp, + build2_v (MODIFY_EXPR, mvar, convert (type, val)), + build_empty_stmt (input_location)); }  tmp = build3_v (COND_EXPR, tmp, thencase,  build_empty_stmt (input_location)); if (cond != NULL_TREE) tmp = build3_v (COND_EXPR, cond, tmp, build_empty_stmt (input_location));  gfc_add_expr_to_block (&se>pre, tmp);  argexpr = argexpr>next; } se>expr = mvar; } diff git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 new file mode 100644 index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786  /dev/null +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 @@ 0,0 +1,9 @@ +! { dgdo compile { target aarch64*** } } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "__builtin_fmaxl " 7 "optimized" } } diff git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 new file mode 100644 index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7  /dev/null +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 @@ 0,0 +1,9 @@ +! { dgdo compile { target aarch64*** } } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "__builtin_fminl " 7 "optimized" } } diff git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90 new file mode 100644 index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f  /dev/null +++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90 @@ 0,0 +1,15 @@ +! { dgdo compile } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "MIN_EXPR" 7 "optimized" } } +! { dgfinal { scantreedumptimes "MAX_EXPR" 7 "optimized" } }
On Wed, Jul 18, 2018 at 5:03 PM, Kyrill Tkachov <kyrylo.tkachov@foss.arm.com > wrote: > > On 18/07/18 14:26, Thomas König wrote: > >> Hi Kyrlll, >> >> Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov < >>> kyrylo.tkachov@foss.arm.com>: >>> >>> Thomas, Janne, would this relaxation of NaN handling be acceptable given >>> the benefits >>> mentioned above? If so, what would be the recommended adjustment to the >>> nan_1.f90 test? >>> >> I would be a bit careful about changing behavior in such a major way. >> What would the results with NaN and infinity then be, with or without >> optimization? Would the results be consistent with min(nan,num) vs >> min(num,nan)? Would they be consistent with the new IEEE standard? >> >> In general, I think that min(nan,num) should be nan and that our current >> behavior is not the best. >> >> Does anybody have dats points on how this is handled by other compilers? >> >> Oh, and if anything is changed, then compile and runtime behavior should >> always be the same. >> > > Thanks, that makes it clearer what behaviour is accceptable. > > So this v3 patch follows Richard Sandiford's suggested approach of > emitting IFN_FMIN/FMAX > when dealing with floatingpoint values and NaN handling is important and > the target > supports the IFN_FMIN/FMAX. Otherwise the current explicit comparison > sequence is emitted. > For integer types and ffastmath floatingpoint it will emit MIN/MAX_EXPR. > > With this patch the nan_1.f90 behaviour is preserved on all targets, we > get the optimal > sequence on aarch64 and on x86_64 we avoid the function call, with no > changes in code generation. > > This gives the performance improvement on 521.wrf on aarch64 and leaves it > unchanged on x86_64. > > I'm hoping this addresses all the concerns raised in this thread: > * The NaNhandling behaviour is unchanged on all platforms. > * The fast inline sequence is emitted where it is available. > * No calls to library fmin*/fmax* are emitted where there were none. > * MIN/MAX_EXPR sequence are emitted where possible. > > Is this acceptable? > So if I understand it correctly, the "internal fn" thing is a mechanism that allows to check whether the target supports expanding a builtin inline or whether it requires a call to an external library function? If so, then yes, Ok, thanks for the patch!  Janne Blomqvist
On Wed, Jul 18, 2018 at 4:26 PM, Thomas König <tk@tkoenig.net> wrote: > Hi Kyrlll, > > > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov < > kyrylo.tkachov@foss.arm.com>: > > > > Thomas, Janne, would this relaxation of NaN handling be acceptable given > the benefits > > mentioned above? If so, what would be the recommended adjustment to the > nan_1.f90 test? > > I would be a bit careful about changing behavior in such a major way. What > would the results with NaN and infinity then be, with or without > optimization? Would the results be consistent with min(nan,num) vs > min(num,nan)? Would they be consistent with the new IEEE standard? > AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with Inf. For NaN the result is undefined, and you might indeed have min(a, NaN) = a min(NaN, a) = NaN where "a" is a normal number. (I think that happens at least on x86 if MIN_EXPR is expanded to minsd/minpd. Apparently what the proper result for min(a, NaN) should be is contentious enough that minnum was removed from the upcoming IEEE 754 revision, and new operations AFAICS have the semantics minimum(a, NaN) = minimum(NaN, a) = NaN minimumNumber(a, NaN) = minimumNumber(NaN, a) = a That is minimumNumber corresponds to minnum in IEEE 7542008 and fmin* in C, and to the current behavior of gfortran. > In general, I think that min(nan,num) should be nan and that our current > behavior is not the best. > There was some extensive discussion of that in the Julia bug report I linked to in an earlier message, and they came to the same conclusion and changed their behavior. > Does anybody have dats points on how this is handled by other compilers? > The only other compiler I have access to at the moment is ifort (and not the latest version), but maybe somebody has access to a wider variety? > Oh, and if anything is changed, then compile and runtime behavior should > always be the same. > Well, IFF we place some weight on the runtime behavior being particularly sensible wrt NaN's, which it wouldn't be if we just use a plain MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In particular, if other compilers are inconsistent, we might as well do whatever is fastest.  Janne Blomqvist
Thanks for doing this. Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes: > + calc = build_call_expr_internal_loc (input_location, ifn, type, > + 2, mvar, convert (type, val)); (indentation looks off) > diff git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 > new file mode 100644 > index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786 >  /dev/null > +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 > @@ 0,0 +1,9 @@ > +! { dgdo compile { target aarch64*** } } > +! { dgoptions "O2 fdumptreeoptimized" } > + > +subroutine fool (a, b, c, d, e, f, g, h) > + real (kind=16) :: a, b, c, d, e, f, g, h > + a = max (a, b, c, d, e, f, g, h) > +end subroutine > + > +! { dgfinal { scantreedumptimes "__builtin_fmaxl " 7 "optimized" } } > diff git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 > new file mode 100644 > index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7 >  /dev/null > +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 > @@ 0,0 +1,9 @@ > +! { dgdo compile { target aarch64*** } } > +! { dgoptions "O2 fdumptreeoptimized" } > + > +subroutine fool (a, b, c, d, e, f, g, h) > + real (kind=16) :: a, b, c, d, e, f, g, h > + a = min (a, b, c, d, e, f, g, h) > +end subroutine > + > +! { dgfinal { scantreedumptimes "__builtin_fminl " 7 "optimized" } } Do these still pass? I wouldn't have expected us to use __builtin_fmin* and __builtin_fmax* now. It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4 and kind=8 on AArch64, since that's really the end goal here. Thanks, Richard
Hi Richard, On 18/07/18 16:27, Richard Sandiford wrote: > Thanks for doing this. > > Kyrill Tkachov <kyrylo.tkachov@foss.arm.com> writes: >> + calc = build_call_expr_internal_loc (input_location, ifn, type, >> + 2, mvar, convert (type, val)); > (indentation looks off) > >> diff git a/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 >> new file mode 100644 >> index 0000000000000000000000000000000000000000..8c8ea063e5d0718dc829c1f5574c5b46040e6786 >>  /dev/null >> +++ b/gcc/testsuite/gfortran.dg/max_fmaxl_aarch64.f90 >> @@ 0,0 +1,9 @@ >> +! { dgdo compile { target aarch64*** } } >> +! { dgoptions "O2 fdumptreeoptimized" } >> + >> +subroutine fool (a, b, c, d, e, f, g, h) >> + real (kind=16) :: a, b, c, d, e, f, g, h >> + a = max (a, b, c, d, e, f, g, h) >> +end subroutine >> + >> +! { dgfinal { scantreedumptimes "__builtin_fmaxl " 7 "optimized" } } >> diff git a/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 >> new file mode 100644 >> index 0000000000000000000000000000000000000000..92368917fb48e0c468a16d080ab3a9ac842e01a7 >>  /dev/null >> +++ b/gcc/testsuite/gfortran.dg/min_fminl_aarch64.f90 >> @@ 0,0 +1,9 @@ >> +! { dgdo compile { target aarch64*** } } >> +! { dgoptions "O2 fdumptreeoptimized" } >> + >> +subroutine fool (a, b, c, d, e, f, g, h) >> + real (kind=16) :: a, b, c, d, e, f, g, h >> + a = min (a, b, c, d, e, f, g, h) >> +end subroutine >> + >> +! { dgfinal { scantreedumptimes "__builtin_fminl " 7 "optimized" } } > Do these still pass? I wouldn't have expected us to use __builtin_fmin* > and __builtin_fmax* now. > > It would be good to have tests that we use ".FMIN" and ".FMAX" for kind=4 > and kind=8 on AArch64, since that's really the end goal here. Doh, yes. I had spotted that myself after I had sent out the patch. I've fixed that and the indentation issue in this small revision. Given Janne's comments I will commit this tomorrow if there are no objections. This patch should be a conservative improvement. If the Fortran folks decide to sacrifice the more predictable NaN handling in favour of more optimisation leeway by using MIN/MAX_EXPR unconditionally we can do that as a followup. Thanks for the help, Kyrill 20180718 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * transintrinsic.c: (gfc_conv_intrinsic_minmax): Emit MIN_MAX_EXPR or IFN_FMIN/FMAX sequence to calculate the min/max when possible. 20180718 Kyrylo Tkachov <kyrylo.tkachov@arm.com> * gfortran.dg/max_fmax_aarch64.f90: New test. * gfortran.dg/min_fmin_aarch64.f90: Likewise. * gfortran.dg/minmax_integer.f90: Likewise. diff git a/gcc/fortran/transintrinsic.c b/gcc/fortran/transintrinsic.c index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..c9b5479740c3f98f906132fda5c252274c4b6edd 100644  a/gcc/fortran/transintrinsic.c +++ b/gcc/fortran/transintrinsic.c @@ 31,6 +31,7 @@ along with GCC; see the file COPYING3. If not see #include "trans.h" #include "stringpool.h" #include "foldconst.h" +#include "internalfn.h" #include "treenested.h" #include "storlayout.h" #include "toplev.h" /* For rest_of_decl_compilation. */ @@ 3874,14 +3875,15 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr) minmax (a1, a2, a3, ...) { mvar = a1;  if (a2 .op. mvar  isnan (mvar))  mvar = a2;  if (a3 .op. mvar  isnan (mvar))  mvar = a3; + mvar = COMP (mvar, a2) + mvar = COMP (mvar, a3) ...  return mvar + return mvar; }  */ + Where COMP is MIN/MAX_EXPR for integral types or when we don't + care about NaNs, or IFN_FMIN/MAX when the target has support for + fast NaNhonouring min/max. When neither holds expand a sequence + of explicit comparisons. */ /* TODO: Mismatching types can occur when specific names are used. These should be handled during resolution. */ @@ 3891,7 +3893,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) tree tmp; tree mvar; tree val;  tree thencase; tree *args; tree type; gfc_actual_arglist *argexpr; @@ 3912,55 +3913,77 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) mvar = gfc_create_var (type, "M"); gfc_add_modify (&se>pre, mvar, args[0]);  for (i = 1, argexpr = argexpr>next; i < nargs; i++)  {  tree cond, isnan; + internal_fn ifn = op == GT_EXPR ? IFN_FMAX : IFN_FMIN; + + for (i = 1, argexpr = argexpr>next; i < nargs; i++, argexpr = argexpr>next) + { + tree cond = NULL_TREE; val = args[i]; /* Handle absent optional arguments by ignoring the comparison. */ if (argexpr>expr>expr_type == EXPR_VARIABLE && argexpr>expr>symtree>n.sym>attr.optional && TREE_CODE (val) == INDIRECT_REF)  cond = fold_build2_loc (input_location, + { + cond = fold_build2_loc (input_location, NE_EXPR, logical_type_node, TREE_OPERAND (val, 0), build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));  else  {  cond = NULL_TREE;  + } + else if (!VAR_P (val) && !TREE_CONSTANT (val)) /* Only evaluate the argument once. */  if (!VAR_P (val) && !TREE_CONSTANT (val))  val = gfc_evaluate_now (val, &se>pre);  } + val = gfc_evaluate_now (val, &se>pre);  thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val)); + tree calc; + /* If we dealing with integral types or we don't care about NaNs + just do a MIN/MAX_EXPR. */ + if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)) + { + + tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; + calc = fold_build2_loc (input_location, code, type, + convert (type, val), mvar); + tmp = build2_v (MODIFY_EXPR, mvar, calc);  tmp = fold_build2_loc (input_location, op, logical_type_node,  convert (type, val), mvar); + } + /* If we care about NaNs and we have internal functions available for + fmin/fmax to perform the comparison, use those. */ + else if (SCALAR_FLOAT_TYPE_P (type) + && direct_internal_fn_supported_p (ifn, type, OPTIMIZE_FOR_SPEED)) + { + calc = build_call_expr_internal_loc (input_location, ifn, type, + 2, mvar, convert (type, val)); + tmp = build2_v (MODIFY_EXPR, mvar, calc);  /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to  __builtin_isnan might be made dependent on that module being loaded,  to help performance of programs that don't rely on IEEE semantics. */  if (FLOAT_TYPE_P (TREE_TYPE (mvar))) + } + /* Otherwise expand to: + mvar = a1; + if (a2 .op. mvar  isnan (mvar)) + mvar = a2; + if (a3 .op. mvar  isnan (mvar)) + mvar = a3; + ... */ + else {  isnan = build_call_expr_loc (input_location,  builtin_decl_explicit (BUILT_IN_ISNAN),  1, mvar); + tree isnan = build_call_expr_loc (input_location, + builtin_decl_explicit (BUILT_IN_ISNAN), + 1, mvar); + tmp = fold_build2_loc (input_location, op, logical_type_node, + convert (type, val), mvar); + tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,  logical_type_node, tmp,  fold_convert (logical_type_node, isnan)); + logical_type_node, tmp, + fold_convert (logical_type_node, isnan)); + tmp = build3_v (COND_EXPR, tmp, + build2_v (MODIFY_EXPR, mvar, convert (type, val)), + build_empty_stmt (input_location)); }  tmp = build3_v (COND_EXPR, tmp, thencase,  build_empty_stmt (input_location)); if (cond != NULL_TREE) tmp = build3_v (COND_EXPR, cond, tmp, build_empty_stmt (input_location));  gfc_add_expr_to_block (&se>pre, tmp);  argexpr = argexpr>next; } se>expr = mvar; } diff git a/gcc/testsuite/gfortran.dg/max_fmax_aarch64.f90 b/gcc/testsuite/gfortran.dg/max_fmax_aarch64.f90 new file mode 100644 index 0000000000000000000000000000000000000000..b818241a1f9aa7018efaf300cfecb70f413b7573  /dev/null +++ b/gcc/testsuite/gfortran.dg/max_fmax_aarch64.f90 @@ 0,0 +1,15 @@ +! { dgdo compile { target aarch64*** } } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + + +! { dgfinal { scantreedumptimes "\.FMAX " 14 "optimized" } } diff git a/gcc/testsuite/gfortran.dg/min_fmin_aarch64.f90 b/gcc/testsuite/gfortran.dg/min_fmin_aarch64.f90 new file mode 100644 index 0000000000000000000000000000000000000000..009869b497df7737089971e00c01e1c29c0a3032  /dev/null +++ b/gcc/testsuite/gfortran.dg/min_fmin_aarch64.f90 @@ 0,0 +1,15 @@ +! { dgdo compile { target aarch64*** } } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "\.FMIN " 14 "optimized" } } diff git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90 new file mode 100644 index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f  /dev/null +++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90 @@ 0,0 +1,15 @@ +! { dgdo compile } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "MIN_EXPR" 7 "optimized" } } +! { dgfinal { scantreedumptimes "MAX_EXPR" 7 "optimized" } }
On Wed, 18 Jul 2018, Janne Blomqvist wrote: > minimumNumber(a, NaN) = minimumNumber(NaN, a) = a > > That is minimumNumber corresponds to minnum in IEEE 7542008 and fmin* in No, it differs in the handling of signaling NaNs (with minimumNumber, if the NaN argument is signaling, it results in the "invalid" exception but the nonNaN argument is still returned, whereas with minNum, a quiet NaN was returned in that case). A new fminimum_num function is proposed as a C binding to the new operation. http://www.openstd.org/jtc1/sc22/wg14/www/docs/n2273.pdf (The new operations are also more strictly defined regarding zero arguments, to treat 0 as less than +0, which was unspecified for minNum and fmin.)  Joseph S. Myers joseph@codesourcery.com
On Wed, Jul 18, 2018 at 6:10 PM, Janne Blomqvist <blomqvist.janne@gmail.com> wrote: > On Wed, Jul 18, 2018 at 4:26 PM, Thomas König <tk@tkoenig.net> wrote: > >> Hi Kyrlll, >> >> > Am 18.07.2018 um 13:17 schrieb Kyrill Tkachov < >> kyrylo.tkachov@foss.arm.com>: >> > >> > Thomas, Janne, would this relaxation of NaN handling be acceptable >> given the benefits >> > mentioned above? If so, what would be the recommended adjustment to the >> nan_1.f90 test? >> >> I would be a bit careful about changing behavior in such a major way. >> What would the results with NaN and infinity then be, with or without >> optimization? Would the results be consistent with min(nan,num) vs >> min(num,nan)? Would they be consistent with the new IEEE standard? >> > > AFAIU, MIN/MAX_EXPR do the right thing when comparing a normal number with > Inf. For NaN the result is undefined, and you might indeed have > > min(a, NaN) = a > min(NaN, a) = NaN > > where "a" is a normal number. > > (I think that happens at least on x86 if MIN_EXPR is expanded to > minsd/minpd. > > Apparently what the proper result for min(a, NaN) should be is contentious > enough that minnum was removed from the upcoming IEEE 754 revision, and new > operations AFAICS have the semantics > > minimum(a, NaN) = minimum(NaN, a) = NaN > minimumNumber(a, NaN) = minimumNumber(NaN, a) = a > > That is minimumNumber corresponds to minnum in IEEE 7542008 and fmin* in > C, and to the current behavior of gfortran. > > >> In general, I think that min(nan,num) should be nan and that our current >> behavior is not the best. >> > > There was some extensive discussion of that in the Julia bug report I > linked to in an earlier message, and they came to the same conclusion and > changed their behavior. > > >> Does anybody have dats points on how this is handled by other compilers? >> > > The only other compiler I have access to at the moment is ifort (and not > the latest version), but maybe somebody has access to a wider variety? > > >> Oh, and if anything is changed, then compile and runtime behavior should >> always be the same. >> > > Well, IFF we place some weight on the runtime behavior being particularly > sensible wrt NaN's, which it wouldn't be if we just use a plain > MIN/MAX_EXPR. Is it worth taking a performance hit for, though? In > particular, if other compilers are inconsistent, we might as well do > whatever is fastest. > > >  > Janne Blomqvist > The testcase below (the functions in a separate file to prevent interprocedural and constant propagation optimizations): program main implicit none real :: a, b = 1., mymax, mydiv external mymax, mydiv a = mydiv(0., 0.) print *, 'Verify that the following value is a NaN: ', a print *, 'max(', a, ',', b, ') = ', mymax(a, b) print *, 'max(', b, ',', a, ') = ', mymax(b, a) a = mydiv(1., 0.) print *, 'Verify that the following is a Inf: ', a print *, 'max(', a, ',', b, ') = ', mymax(a, b) print *, 'max(', b, ',', a, ') = ', mymax(b, a) end program main real function mymax(a, b) implicit none real :: a, b mymax = max(a, b) end function mymax real function mydiv(a, b) implicit none real :: a, b mydiv = a/b end function mydiv With gfortran 6.2 (didn't bother to check other versions as it shouldn't have changed lately) and Intel Fortran 17.0.1 I get the following: % gfortran main.f90 my.f90 && ./a.out Verify that the following value is a NaN: NaN max( NaN , 1.00000000 ) = 1.00000000 max( 1.00000000 , NaN ) = 1.00000000 Verify that the following is a Inf: Infinity max( Infinity , 1.00000000 ) = Infinity max( 1.00000000 , Infinity ) = Infinity % gfortran ffastmath main.f90 my.f90 && ./a.out Verify that the following value is a NaN: NaN max( NaN , 1.00000000 ) = NaN max( 1.00000000 , NaN ) = 1.00000000 Verify that the following is a Inf: Infinity max( Infinity , 1.00000000 ) = Infinity max( 1.00000000 , Infinity ) = Infinity % ifort main.f90 my.f90 && ./a.out Verify that the following value is a NaN: NaN max( NaN , 1.000000 ) = 1.000000 max( 1.000000 , NaN ) = NaN Verify that the following is a Inf: Infinity max( Infinity , 1.000000 ) = Infinity max( 1.000000 , Infinity ) = Infinity % ifort fpmodel strict main.f90 my.f90 && ./a.out Verify that the following value is a NaN: NaN max( NaN , 1.000000 ) = 1.000000 max( 1.000000 , NaN ) = NaN Verify that the following is a Inf: Infinity max( Infinity , 1.000000 ) = Infinity max( 1.000000 , Infinity ) = Infinity For brevity I have omitted tests with various O[N] optimization levels, which didn't affect the results on either gfortran nor ifort. This suggests that ifort does the equivalent of MAX_EXPR unconditionally. Does anyone have access to other compilers, what results do they give?  Janne Blomqvist
diff git a/gcc/fortran/transintrinsic.c b/gcc/fortran/transintrinsic.c index d306e3a5a6209c1621d91f99ffc366acecd9c3d0..e5a1f1ddabeedc7b9f473db11e70f29548fc69ac 100644  a/gcc/fortran/transintrinsic.c +++ b/gcc/fortran/transintrinsic.c @@ 3874,14 +3874,11 @@ gfc_conv_intrinsic_ttynam (gfc_se * se, gfc_expr * expr) minmax (a1, a2, a3, ...) { mvar = a1;  if (a2 .op. mvar  isnan (mvar))  mvar = a2;  if (a3 .op. mvar  isnan (mvar))  mvar = a3; + mvar = MIN/MAX_EXPR (mvar, a2); + mvar = MIN/MAX_EXPR (mvar, a3); ...  return mvar  }  */ + return mvar; + } */ /* TODO: Mismatching types can occur when specific names are used. These should be handled during resolution. */ @@ 3891,7 +3888,6 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) tree tmp; tree mvar; tree val;  tree thencase; tree *args; tree type; gfc_actual_arglist *argexpr; @@ 3912,55 +3908,37 @@ gfc_conv_intrinsic_minmax (gfc_se * se, gfc_expr * expr, enum tree_code op) mvar = gfc_create_var (type, "M"); gfc_add_modify (&se>pre, mvar, args[0]);  for (i = 1, argexpr = argexpr>next; i < nargs; i++)  {  tree cond, isnan; + for (i = 1, argexpr = argexpr>next; i < nargs; i++, argexpr = argexpr>next) + { + tree cond = NULL_TREE; val = args[i]; /* Handle absent optional arguments by ignoring the comparison. */ if (argexpr>expr>expr_type == EXPR_VARIABLE && argexpr>expr>symtree>n.sym>attr.optional && TREE_CODE (val) == INDIRECT_REF)  cond = fold_build2_loc (input_location, + { + cond = fold_build2_loc (input_location, NE_EXPR, logical_type_node, TREE_OPERAND (val, 0), build_int_cst (TREE_TYPE (TREE_OPERAND (val, 0)), 0));  else  {  cond = NULL_TREE;  + } + else if (!VAR_P (val) && !TREE_CONSTANT (val)) /* Only evaluate the argument once. */  if (!VAR_P (val) && !TREE_CONSTANT (val))  val = gfc_evaluate_now (val, &se>pre);  }   thencase = build2_v (MODIFY_EXPR, mvar, convert (type, val)); + val = gfc_evaluate_now (val, &se>pre);  tmp = fold_build2_loc (input_location, op, logical_type_node,  convert (type, val), mvar); + tree calc;  /* FIXME: When the IEEE_ARITHMETIC module is implemented, the call to  __builtin_isnan might be made dependent on that module being loaded,  to help performance of programs that don't rely on IEEE semantics. */  if (FLOAT_TYPE_P (TREE_TYPE (mvar)))  {  isnan = build_call_expr_loc (input_location,  builtin_decl_explicit (BUILT_IN_ISNAN),  1, mvar);  tmp = fold_build2_loc (input_location, TRUTH_OR_EXPR,  logical_type_node, tmp,  fold_convert (logical_type_node, isnan));  }  tmp = build3_v (COND_EXPR, tmp, thencase,  build_empty_stmt (input_location)); + tree_code code = op == GT_EXPR ? MAX_EXPR : MIN_EXPR; + calc = fold_build2_loc (input_location, code, type, + convert (type, val), mvar); + tmp = build2_v (MODIFY_EXPR, mvar, calc); if (cond != NULL_TREE) tmp = build3_v (COND_EXPR, cond, tmp, build_empty_stmt (input_location));  gfc_add_expr_to_block (&se>pre, tmp);  argexpr = argexpr>next; } se>expr = mvar; } diff git a/gcc/testsuite/gfortran.dg/max_float.f90 b/gcc/testsuite/gfortran.dg/max_float.f90 new file mode 100644 index 0000000000000000000000000000000000000000..a3a5d4f5df29cfa9c4e3abc2c18e7d3de1169fc3  /dev/null +++ b/gcc/testsuite/gfortran.dg/max_float.f90 @@ 0,0 +1,19 @@ +! { dgdo compile } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "MAX_EXPR " 21 "optimized" } } diff git a/gcc/testsuite/gfortran.dg/min_float.f90 b/gcc/testsuite/gfortran.dg/min_float.f90 new file mode 100644 index 0000000000000000000000000000000000000000..41bd6b3c4062f364791841f7097f9a5c00782ec8  /dev/null +++ b/gcc/testsuite/gfortran.dg/min_float.f90 @@ 0,0 +1,19 @@ +! { dgdo compile } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine fool (a, b, c, d, e, f, g, h) + real (kind=16) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foo (a, b, c, d, e, f, g, h) + real (kind=8) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + real (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "MIN_EXPR " 21 "optimized" } } diff git a/gcc/testsuite/gfortran.dg/minmax_integer.f90 b/gcc/testsuite/gfortran.dg/minmax_integer.f90 new file mode 100644 index 0000000000000000000000000000000000000000..5b6be38c7055ce4e8620cf75ec7d8a182436b24f  /dev/null +++ b/gcc/testsuite/gfortran.dg/minmax_integer.f90 @@ 0,0 +1,15 @@ +! { dgdo compile } +! { dgoptions "O2 fdumptreeoptimized" } + +subroutine foo (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = min (a, b, c, d, e, f, g, h) +end subroutine + +subroutine foof (a, b, c, d, e, f, g, h) + integer (kind=4) :: a, b, c, d, e, f, g, h + a = max (a, b, c, d, e, f, g, h) +end subroutine + +! { dgfinal { scantreedumptimes "MIN_EXPR" 7 "optimized" } } +! { dgfinal { scantreedumptimes "MAX_EXPR" 7 "optimized" } }