Message ID  5613133.Bo1V3R3Yke@polaris 

State  New 
Headers  show 
Series 

Related  show 
On Fri, May 31, 2019 at 11:56 AM Eric Botcazou <ebotcazou@adacore.com> wrote: > > Hi, > > this is a regression present for 32bit targets on mainline and 9 branch only, > but the underlying issue dates back from the removal of the TYPE_IS_SIZETYPE > machinery. Since the removal of this machinery, there is an internal conflict > for sizetype: it's an unsigned type so with wraparound semantics on overflow > but overflow nevertheless needs to be tracked for it in order to avoid buffer > overflows for large objects. > > The constant folder contains various tricks to that effect and the Ada front > end even more so, because VLAs are firstclass citizens in the language and > their lower bound is not necessarily 0. In particular, you need to be able to > distinguish large constants in sizetype from small negative constants turned > into even larger constants, because the latter appear in the length of e.g.: > > type Arr is array (2 .. N) of Long_Long_Integer; > > This works when the expressions haven't been too much mangled by the constant > folder and here we have a counterexample for 32bit targets. Starting with: > > (N + 4294967295) * 8 > > which is the sizetype expression of the size of Arr in bytes, extract_muldiv_1 > distributes the multiplication: > > N * 8 + 4294967288 > > and, immediately after, fold_plusminus_mult_expr factors it back: > > (N + 536870911) * 8 > > At this point, there is no way for gigi to tell that it's the expression of a > simple array with no overflow in sight because the 536870911 constant is large > but not large enough, so gigi flags it as overflowing for small values of N. > > I don't see any other way out than disabling the backandforth mathematical > game played by the constant folder in this case, which very likely brings no > benefit in practice, hence the proposed fix. > > Tested on x86_64suselinux, OK for the mainline and 9 branch? Hmm, ISTR we had such mitigations in place (or have) elsewhere keying on the most significant bit set instead of poweroftwo. But your case likely recurses and runs into the extract_multiv limiting to eventually stop, even for (N + 4) * 8, right? If so shouldn't we prevent this even for !TYPE_OVERFLOW_WRAPS? Also + && !(tree_fits_shwi_p (c) + && exact_log2 (absu_hwi (tree_to_shwi (c))) > 0)) is better written as && exact_log2 (wi::to_wide (c)) > 0 not sure why the sizetype constant for you fits in a signed HWI or you need to compute its absolute value. Eventually you need to use wi::abs(wide_int::from (wi::to_wide (c), TYPE_PRECISION (TREE_TYPE (c)), SIGNED)) or so. Thanks, Richard. > > 20190531 Eric Botcazou <ebotcazou@adacore.com> > > * foldconst.c (extract_muldiv_1) <PLUS_EXPR>: Do not distribute a > multiplication by a poweroftwo value. > > > 20190531 Eric Botcazou <ebotcazou@adacore.com> > > * gnat.dg/specs/discr6.ads: New test. > >  > Eric Botcazou
> Hmm, ISTR we had such mitigations in place (or have) elsewhere keying > on the most significant bit set instead of poweroftwo. fold_plusminus_mult_expr only factors out for poweroftwo: if (exact_log2 (absu_hwi (int11)) > 0 && int01 % int11 == 0 /* The remainder should not be a constant, otherwise we end up folding i * 4 + 2 to (i * 2 + 1) * 2 which has increased the number of multiplications necessary. */ && TREE_CODE (arg10) != INTEGER_CST) > But your case > likely recurses and runs into the extract_multiv limiting to eventually > stop, even for (N + 4) * 8, right? Yes, it oscillates between extract_multiv and fold_plusminus_mult_expr until reaching the maximal depth. > If so shouldn't we prevent this even for !TYPE_OVERFLOW_WRAPS? Also > > + && !(tree_fits_shwi_p (c) > + && exact_log2 (absu_hwi (tree_to_shwi (c))) > 0)) The code only distributes for TYPE_OVERFLOW_WRAPS though: /* The last case is if we are a multiply. In that case, we can apply the distributive law to commute the multiply and addition if the multiplication of the constants doesn't overflow and overflow is defined. With undefined overflow op0 * c might overflow, while (op0 + orig_op1) * c doesn't. */ if (code == MULT_EXPR && TYPE_OVERFLOW_WRAPS (ctype)) return fold_build2 (tcode, ctype, fold_build2 (code, ctype, fold_convert (ctype, op0), fold_convert (ctype, c)), op1); > is better written as > > && exact_log2 (wi::to_wide (c)) > 0 > > not sure why the sizetype constant for you fits in a signed HWI > or you need to compute its absolute value. Eventually you > need to use wi::abs(wide_int::from (wi::to_wide (c), TYPE_PRECISION > (TREE_TYPE (c)), SIGNED)) > or so. This is just mirrored on what fold_plusminus_mult_expr does.  Eric Botcazou
> Hmm, ISTR we had such mitigations in place (or have) elsewhere keying > on the most significant bit set instead of poweroftwo. But your case > likely recurses and runs into the extract_multiv limiting to eventually > stop, even for (N + 4) * 8, right? If so shouldn't we prevent this > even for !TYPE_OVERFLOW_WRAPS? Also > > + && !(tree_fits_shwi_p (c) > + && exact_log2 (absu_hwi (tree_to_shwi (c))) > 0)) > > is better written as > > && exact_log2 (wi::to_wide (c)) > 0 It turns out that pow2p_hwi can be used instead and is cheaper, so I have changed both extract_muldiv_1 and fold_plusminus_mult_expr to using it. * foldconst.c (extract_muldiv_1) <PLUS_EXPR>: Do not distribute a multiplication by a poweroftwo value. (fold_plusminus_mult_expr): Use pow2p_hwi to detect a poweroftwo value and turn the modulo operation into a masking operation.  Eric Botcazou Index: foldconst.c ===================================================================  foldconst.c (revision 271694) +++ foldconst.c (working copy) @@ 6475,8 +6475,12 @@ extract_muldiv_1 (tree t, tree c, enum t apply the distributive law to commute the multiply and addition if the multiplication of the constants doesn't overflow and overflow is defined. With undefined overflow  op0 * c might overflow, while (op0 + orig_op1) * c doesn't. */  if (code == MULT_EXPR && TYPE_OVERFLOW_WRAPS (ctype)) + op0 * c might overflow, while (op0 + orig_op1) * c doesn't. + But fold_plusminus_mult_expr would factor back any poweroftwo + value so do not distribute in the first place in this case. */ + if (code == MULT_EXPR + && TYPE_OVERFLOW_WRAPS (ctype) + && !(tree_fits_shwi_p (c) && pow2p_hwi (absu_hwi (tree_to_shwi (c))))) return fold_build2 (tcode, ctype, fold_build2 (code, ctype, fold_convert (ctype, op0), @@ 7124,14 +7128,13 @@ fold_plusminus_mult_expr (location_t loc /* No identical multiplicands; see if we can find a common poweroftwo factor in nonpoweroftwo multiplies. This can help in multidimensional array access. */  else if (tree_fits_shwi_p (arg01)  && tree_fits_shwi_p (arg11)) + else if (tree_fits_shwi_p (arg01) && tree_fits_shwi_p (arg11)) {  HOST_WIDE_INT int01, int11, tmp; + HOST_WIDE_INT int01 = tree_to_shwi (arg01); + HOST_WIDE_INT int11 = tree_to_shwi (arg11); + HOST_WIDE_INT tmp; bool swap = false; tree maybe_same;  int01 = tree_to_shwi (arg01);  int11 = tree_to_shwi (arg11); /* Move min of absolute values to int11. */ if (absu_hwi (int01) < absu_hwi (int11)) @@ 7144,7 +7147,10 @@ fold_plusminus_mult_expr (location_t loc else maybe_same = arg11;  if (exact_log2 (absu_hwi (int11)) > 0 && int01 % int11 == 0 + unsigned HOST_WIDE_INT factor = absu_hwi (int11); + if (factor > 1 + && pow2p_hwi (factor) + && (int01 & (factor  1)) == 0 /* The remainder should not be a constant, otherwise we end up folding i * 4 + 2 to (i * 2 + 1) * 2 which has increased the number of multiplications necessary. */
On Mon, Jun 3, 2019 at 12:38 PM Eric Botcazou <ebotcazou@adacore.com> wrote: > > > Hmm, ISTR we had such mitigations in place (or have) elsewhere keying > > on the most significant bit set instead of poweroftwo. But your case > > likely recurses and runs into the extract_multiv limiting to eventually > > stop, even for (N + 4) * 8, right? If so shouldn't we prevent this > > even for !TYPE_OVERFLOW_WRAPS? Also > > > > + && !(tree_fits_shwi_p (c) > > + && exact_log2 (absu_hwi (tree_to_shwi (c))) > 0)) > > > > is better written as > > > > && exact_log2 (wi::to_wide (c)) > 0 > > It turns out that pow2p_hwi can be used instead and is cheaper, so I have > changed both extract_muldiv_1 and fold_plusminus_mult_expr to using it. OK, thanks. Richard. > > * foldconst.c (extract_muldiv_1) <PLUS_EXPR>: Do not distribute a > multiplication by a poweroftwo value. > (fold_plusminus_mult_expr): Use pow2p_hwi to detect a poweroftwo value > and turn the modulo operation into a masking operation. > >  > Eric Botcazou
Index: foldconst.c ===================================================================  foldconst.c (revision 271694) +++ foldconst.c (working copy) @@ 6475,8 +6475,13 @@ extract_muldiv_1 (tree t, tree c, enum t apply the distributive law to commute the multiply and addition if the multiplication of the constants doesn't overflow and overflow is defined. With undefined overflow  op0 * c might overflow, while (op0 + orig_op1) * c doesn't. */  if (code == MULT_EXPR && TYPE_OVERFLOW_WRAPS (ctype)) + op0 * c might overflow, while (op0 + orig_op1) * c doesn't. + But fold_plusminus_mult_expr would factor back any poweroftwo + value so do not distribute in the first place in this case. */ + if (code == MULT_EXPR + && TYPE_OVERFLOW_WRAPS (ctype) + && !(tree_fits_shwi_p (c) + && exact_log2 (absu_hwi (tree_to_shwi (c))) > 0)) return fold_build2 (tcode, ctype, fold_build2 (code, ctype, fold_convert (ctype, op0),