[07/13] Make vec_perm_indices use new vector encoding

Message ID 87lgiblc9q.fsf@linaro.org
State New
Headers show
Series
  • Make VEC_PERM_EXPR work for variable-length vectors
Related show

Commit Message

Richard Sandiford Dec. 9, 2017, 11:20 p.m.
This patch changes vec_perm_indices from a plain vec<> to a class
that stores a canonicalised permutation, using the same encoding
as for VECTOR_CSTs.  This means that vec_perm_indices now carries
information about the number of vectors being permuted (currently
always 1 or 2) and the number of elements in each input vector.

A new vec_perm_builder class is used to actually build up the vector,
like tree_vector_builder does for trees.  vec_perm_indices is the
completed representation, a bit like VECTOR_CST is for trees.

The patch just does a mechanical conversion of the code to
vec_perm_builder: a later patch uses explicit encodings where possible.

The point of all this is that it makes the representation suitable
for variable-length vectors.  It's no longer necessary for the
underlying vec<>s to store every element explicitly.

In int-vector-builder.h, "using the same encoding as tree and rtx constants"
describes the endpoint -- adding the rtx encoding comes later.


2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* int-vector-builder.h: New file.
	* vec-perm-indices.h: Include int-vector-builder.h.
	(vec_perm_indices): Redefine as an int_vector_builder.
	(auto_vec_perm_indices): Delete.
	(vec_perm_builder): Redefine as a stand-alone class.
	(vec_perm_indices::vec_perm_indices): New function.
	(vec_perm_indices::clamp): Likewise.
	* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
	(vec_perm_indices::new_vector): New function.
	(vec_perm_indices::new_expanded_vector): Update for new
	vec_perm_indices class.
	(vec_perm_indices::rotate_inputs): New function.
	(vec_perm_indices::all_in_range_p): Operate directly on the
	encoded form, without computing elided elements.
	(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
	encoding.  Update for new vec_perm_indices class.
	* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
	the given vec_perm_builder.
	(expand_vec_perm_var): Update vec_perm_builder constructor.
	(expand_mult_highpart): Use vec_perm_builder instead of
	auto_vec_perm_indices.
	* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use a single
	or double series encoding as appropriate.
	* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
	(vect_permute_store_chain): Likewise.
	(vect_grouped_load_supported): Likewise.
	(vect_permute_load_chain): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
	(vect_transform_slp_perm_load): Likewise.
	(vect_schedule_slp_instance): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(vectorizable_mask_load_store): Likewise.
	(vectorizable_bswap): Likewise.
	(vectorizable_store): Likewise.
	(vectorizable_load): Likewise.
	* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use
	tree_to_vec_perm_builder to read the vector from a tree.
	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
	vec_perm_builder instead of a vec_perm_indices.
	(have_whole_vector_shift): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Leave the
	truncation to calc_vec_perm_mask_for_shift.
	(vect_create_epilog_for_reduction): Likewise.
	* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(aarch64_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/arm/arm.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(arm_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
	Update vec_perm_builder constructor.
	(rs6000_expand_interleave): Likewise.
	* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
	(rs6000_expand_interleave): Likewise.

Comments

Richard Biener Dec. 12, 2017, 3:32 p.m. | #1
On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> This patch changes vec_perm_indices from a plain vec<> to a class

> that stores a canonicalised permutation, using the same encoding

> as for VECTOR_CSTs.  This means that vec_perm_indices now carries

> information about the number of vectors being permuted (currently

> always 1 or 2) and the number of elements in each input vector.


Before I dive into  the C++ details can you explain why it needs this
info and how it encodes it for variable-length vectors?  To interleave
two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we
can directly encode N here, can we?  extract even/odd should just
work as { 0, 2, 4, 6, ...} without knowledge of whether we permute
one or two vectors (the one vector case just has two times the same
vector) or how many elements each of the vectors (or the result) has.

Richard.

> A new vec_perm_builder class is used to actually build up the vector,

> like tree_vector_builder does for trees.  vec_perm_indices is the

> completed representation, a bit like VECTOR_CST is for trees.

>

> The patch just does a mechanical conversion of the code to

> vec_perm_builder: a later patch uses explicit encodings where possible.

>

> The point of all this is that it makes the representation suitable

> for variable-length vectors.  It's no longer necessary for the

> underlying vec<>s to store every element explicitly.

>

> In int-vector-builder.h, "using the same encoding as tree and rtx constants"

> describes the endpoint -- adding the rtx encoding comes later.

>

>

> 2017-12-09  Richard Sandiford  <richard.sandiford@linaro.org>

>

> gcc/

>         * int-vector-builder.h: New file.

>         * vec-perm-indices.h: Include int-vector-builder.h.

>         (vec_perm_indices): Redefine as an int_vector_builder.

>         (auto_vec_perm_indices): Delete.

>         (vec_perm_builder): Redefine as a stand-alone class.

>         (vec_perm_indices::vec_perm_indices): New function.

>         (vec_perm_indices::clamp): Likewise.

>         * vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.

>         (vec_perm_indices::new_vector): New function.

>         (vec_perm_indices::new_expanded_vector): Update for new

>         vec_perm_indices class.

>         (vec_perm_indices::rotate_inputs): New function.

>         (vec_perm_indices::all_in_range_p): Operate directly on the

>         encoded form, without computing elided elements.

>         (tree_to_vec_perm_builder): Operate directly on the VECTOR_CST

>         encoding.  Update for new vec_perm_indices class.

>         * optabs.c (expand_vec_perm_const): Create a vec_perm_indices for

>         the given vec_perm_builder.

>         (expand_vec_perm_var): Update vec_perm_builder constructor.

>         (expand_mult_highpart): Use vec_perm_builder instead of

>         auto_vec_perm_indices.

>         * optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and

>         vec_perm_indices instead of auto_vec_perm_indices.  Use a single

>         or double series encoding as appropriate.

>         * fold-const.c (fold_ternary_loc): Use vec_perm_builder and

>         vec_perm_indices instead of auto_vec_perm_indices.

>         * tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.

>         * tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.

>         (vect_permute_store_chain): Likewise.

>         (vect_grouped_load_supported): Likewise.

>         (vect_permute_load_chain): Likewise.

>         (vect_shift_permute_load_chain): Likewise.

>         * tree-vect-slp.c (vect_build_slp_tree_1): Likewise.

>         (vect_transform_slp_perm_load): Likewise.

>         (vect_schedule_slp_instance): Likewise.

>         * tree-vect-stmts.c (perm_mask_for_reverse): Likewise.

>         (vectorizable_mask_load_store): Likewise.

>         (vectorizable_bswap): Likewise.

>         (vectorizable_store): Likewise.

>         (vectorizable_load): Likewise.

>         * tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and

>         vec_perm_indices instead of auto_vec_perm_indices.  Use

>         tree_to_vec_perm_builder to read the vector from a tree.

>         * tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a

>         vec_perm_builder instead of a vec_perm_indices.

>         (have_whole_vector_shift): Use vec_perm_builder and

>         vec_perm_indices instead of auto_vec_perm_indices.  Leave the

>         truncation to calc_vec_perm_mask_for_shift.

>         (vect_create_epilog_for_reduction): Likewise.

>         * config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change

>         from auto_vec_perm_indices to vec_perm_indices.

>         (aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm

>         instead of changing individual elements.

>         (aarch64_vectorize_vec_perm_const): Use new_vector to install

>         the vector in d.perm.

>         * config/arm/arm.c (expand_vec_perm_d::perm): Change

>         from auto_vec_perm_indices to vec_perm_indices.

>         (arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm

>         instead of changing individual elements.

>         (arm_vectorize_vec_perm_const): Use new_vector to install

>         the vector in d.perm.

>         * config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):

>         Update vec_perm_builder constructor.

>         (rs6000_expand_interleave): Likewise.

>         * config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.

>         (rs6000_expand_interleave): Likewise.

>

> Index: gcc/int-vector-builder.h

> ===================================================================

> --- /dev/null   2017-12-09 13:59:56.352713187 +0000

> +++ gcc/int-vector-builder.h    2017-12-09 22:48:47.545825268 +0000

> @@ -0,0 +1,90 @@

> +/* A class for building vector integer constants.

> +   Copyright (C) 2017 Free Software Foundation, Inc.

> +

> +This file is part of GCC.

> +

> +GCC is free software; you can redistribute it and/or modify it under

> +the terms of the GNU General Public License as published by the Free

> +Software Foundation; either version 3, or (at your option) any later

> +version.

> +

> +GCC is distributed in the hope that it will be useful, but WITHOUT ANY

> +WARRANTY; without even the implied warranty of MERCHANTABILITY or

> +FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License

> +for more details.

> +

> +You should have received a copy of the GNU General Public License

> +along with GCC; see the file COPYING3.  If not see

> +<http://www.gnu.org/licenses/>.  */

> +

> +#ifndef GCC_INT_VECTOR_BUILDER_H

> +#define GCC_INT_VECTOR_BUILDER_H 1

> +

> +#include "vector-builder.h"

> +

> +/* This class is used to build vectors of integer type T using the same

> +   encoding as tree and rtx constants.  See vector_builder for more

> +   details.  */

> +template<typename T>

> +class int_vector_builder : public vector_builder<T, int_vector_builder<T> >

> +{

> +  typedef vector_builder<T, int_vector_builder> parent;

> +  friend class vector_builder<T, int_vector_builder>;

> +

> +public:

> +  int_vector_builder () {}

> +  int_vector_builder (unsigned int, unsigned int, unsigned int);

> +

> +  using parent::new_vector;

> +

> +private:

> +  bool equal_p (T, T) const;

> +  bool allow_steps_p () const { return true; }

> +  bool integral_p (T) const { return true; }

> +  T step (T, T) const;

> +  T apply_step (T, unsigned int, T) const;

> +  bool can_elide_p (T) const { return true; }

> +  void note_representative (T *, T) {}

> +};

> +

> +/* Create a new builder for a vector with FULL_NELTS elements.

> +   Initially encode the value as NPATTERNS interleaved patterns with

> +   NELTS_PER_PATTERN elements each.  */

> +

> +template<typename T>

> +inline

> +int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,

> +                                          unsigned int npatterns,

> +                                          unsigned int nelts_per_pattern)

> +{

> +  new_vector (full_nelts, npatterns, nelts_per_pattern);

> +}

> +

> +/* Return true if elements ELT1 and ELT2 are equal.  */

> +

> +template<typename T>

> +inline bool

> +int_vector_builder<T>::equal_p (T elt1, T elt2) const

> +{

> +  return elt1 == elt2;

> +}

> +

> +/* Return the value of element ELT2 minus the value of element ELT1.  */

> +

> +template<typename T>

> +inline T

> +int_vector_builder<T>::step (T elt1, T elt2) const

> +{

> +  return elt2 - elt1;

> +}

> +

> +/* Return a vector element with the value BASE + FACTOR * STEP.  */

> +

> +template<typename T>

> +inline T

> +int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const

> +{

> +  return base + factor * step;

> +}

> +

> +#endif

> Index: gcc/vec-perm-indices.h

> ===================================================================

> --- gcc/vec-perm-indices.h      2017-12-09 22:47:27.885318101 +0000

> +++ gcc/vec-perm-indices.h      2017-12-09 22:48:47.548825399 +0000

> @@ -20,30 +20,102 @@ Software Foundation; either version 3, o

>  #ifndef GCC_VEC_PERN_INDICES_H

>  #define GCC_VEC_PERN_INDICES_H 1

>

> +#include "int-vector-builder.h"

> +

> +/* A vector_builder for building constant permutation vectors.

> +   The elements do not need to be clamped to a particular range

> +   of input elements.  */

> +typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;

> +

>  /* This class represents a constant permutation vector, such as that used

> -   as the final operand to a VEC_PERM_EXPR.  */

> -class vec_perm_indices : public auto_vec<unsigned short, 32>

> +   as the final operand to a VEC_PERM_EXPR.  The vector is canonicalized

> +   for a particular number of input vectors and for a particular number

> +   of elements per input.  The class copes with cases in which the

> +   input and output vectors have different numbers of elements.  */

> +class vec_perm_indices

>  {

> -  typedef unsigned short element_type;

> -  typedef auto_vec<element_type, 32> parent_type;

> +  typedef HOST_WIDE_INT element_type;

>

>  public:

> -  vec_perm_indices () {}

> -  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}

> +  vec_perm_indices ();

> +  vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);

>

> +  void new_vector (const vec_perm_builder &, unsigned int, unsigned int);

>    void new_expanded_vector (const vec_perm_indices &, unsigned int);

> +  void rotate_inputs (int delta);

> +

> +  /* Return the underlying vector encoding.  */

> +  const vec_perm_builder &encoding () const { return m_encoding; }

> +

> +  /* Return the number of output elements.  This is called length ()

> +     so that we present a more vec-like interface.  */

> +  unsigned int length () const { return m_encoding.full_nelts (); }

> +

> +  /* Return the number of input vectors being permuted.  */

> +  unsigned int ninputs () const { return m_ninputs; }

>

> +  /* Return the number of elements in each input vector.  */

> +  unsigned int nelts_per_input () const { return m_nelts_per_input; }

> +

> +  /* Return the total number of input elements.  */

> +  unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }

> +

> +  element_type clamp (element_type) const;

> +  element_type operator[] (unsigned int i) const;

>    bool all_in_range_p (element_type, element_type) const;

>

>  private:

>    vec_perm_indices (const vec_perm_indices &);

> -};

>

> -/* Temporary.  */

> -typedef vec_perm_indices vec_perm_builder;

> -typedef vec_perm_indices auto_vec_perm_indices;

> +  vec_perm_builder m_encoding;

> +  unsigned int m_ninputs;

> +  unsigned int m_nelts_per_input;

> +};

>

>  bool tree_to_vec_perm_builder (vec_perm_builder *, tree);

>  rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);

>

> +inline

> +vec_perm_indices::vec_perm_indices ()

> +  : m_ninputs (0),

> +    m_nelts_per_input (0)

> +{

> +}

> +

> +/* Construct a permutation vector that selects between NINPUTS vector

> +   inputs that have NELTS_PER_INPUT elements each.  Take the elements of

> +   the new vector from ELEMENTS, clamping each one to be in range.  */

> +

> +inline

> +vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,

> +                                   unsigned int ninputs,

> +                                   unsigned int nelts_per_input)

> +{

> +  new_vector (elements, ninputs, nelts_per_input);

> +}

> +

> +/* Return the canonical value for permutation vector element ELT,

> +   taking into account the current number of input elements.  */

> +

> +inline vec_perm_indices::element_type

> +vec_perm_indices::clamp (element_type elt) const

> +{

> +  element_type limit = input_nelts ();

> +  elt %= limit;

> +  /* Treat negative elements as counting from the end.  This only matters

> +     if the vector size is not a power of 2.  */

> +  if (elt < 0)

> +    elt += limit;

> +  return elt;

> +}

> +

> +/* Return the value of vector element I, which might or might not be

> +   explicitly encoded.  */

> +

> +inline vec_perm_indices::element_type

> +vec_perm_indices::operator[] (unsigned int i) const

> +{

> +  return clamp (m_encoding.elt (i));

> +}

> +

>  #endif

> Index: gcc/vec-perm-indices.c

> ===================================================================

> --- gcc/vec-perm-indices.c      2017-12-09 22:47:27.885318101 +0000

> +++ gcc/vec-perm-indices.c      2017-12-09 22:48:47.548825399 +0000

> @@ -22,11 +22,33 @@ Software Foundation; either version 3, o

>  #include "coretypes.h"

>  #include "vec-perm-indices.h"

>  #include "tree.h"

> +#include "fold-const.h"

> +#include "tree-vector-builder.h"

>  #include "backend.h"

>  #include "rtl.h"

>  #include "memmodel.h"

>  #include "emit-rtl.h"

>

> +/* Switch to a new permutation vector that selects between NINPUTS vector

> +   inputs that have NELTS_PER_INPUT elements each.  Take the elements of the

> +   new permutation vector from ELEMENTS, clamping each one to be in range.  */

> +

> +void

> +vec_perm_indices::new_vector (const vec_perm_builder &elements,

> +                             unsigned int ninputs,

> +                             unsigned int nelts_per_input)

> +{

> +  m_ninputs = ninputs;

> +  m_nelts_per_input = nelts_per_input;

> +  /* Expand the encoding and clamp each element.  E.g. { 0, 2, 4, ... }

> +     might wrap halfway if there is only one vector input.  */

> +  unsigned int full_nelts = elements.full_nelts ();

> +  m_encoding.new_vector (full_nelts, full_nelts, 1);

> +  for (unsigned int i = 0; i < full_nelts; ++i)

> +    m_encoding.quick_push (clamp (elements.elt (i)));

> +  m_encoding.finalize ();

> +}

> +

>  /* Switch to a new permutation vector that selects the same input elements

>     as ORIG, but with each element split into FACTOR pieces.  For example,

>     if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is

> @@ -36,14 +58,31 @@ Software Foundation; either version 3, o

>  vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,

>                                        unsigned int factor)

>  {

> -  truncate (0);

> -  reserve (orig.length () * factor);

> -  for (unsigned int i = 0; i < orig.length (); ++i)

> +  m_ninputs = orig.m_ninputs;

> +  m_nelts_per_input = orig.m_nelts_per_input * factor;

> +  m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,

> +                        orig.m_encoding.npatterns () * factor,

> +                        orig.m_encoding.nelts_per_pattern ());

> +  unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();

> +  for (unsigned int i = 0; i < encoded_nelts; ++i)

>      {

> -      element_type base = orig[i] * factor;

> +      element_type base = orig.m_encoding[i] * factor;

>        for (unsigned int j = 0; j < factor; ++j)

> -       quick_push (base + j);

> +       m_encoding.quick_push (base + j);

>      }

> +  m_encoding.finalize ();

> +}

> +

> +/* Rotate the inputs of the permutation right by DELTA inputs.  This changes

> +   the values of the permutation vector but it doesn't change the way that

> +   the elements are encoded.  */

> +

> +void

> +vec_perm_indices::rotate_inputs (int delta)

> +{

> +  element_type element_delta = delta * m_nelts_per_input;

> +  for (unsigned int i = 0; i < m_encoding.length (); ++i)

> +    m_encoding[i] = clamp (m_encoding[i] + element_delta);

>  }

>

>  /* Return true if all elements of the permutation vector are in the range

> @@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c

>  bool

>  vec_perm_indices::all_in_range_p (element_type start, element_type size) const

>  {

> -  for (unsigned int i = 0; i < length (); ++i)

> -    if ((*this)[i] < start || ((*this)[i] - start) >= size)

> +  /* Check the first two elements of each pattern.  */

> +  unsigned int npatterns = m_encoding.npatterns ();

> +  unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();

> +  unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);

> +  for (unsigned int i = 0; i < base_nelts; ++i)

> +    if (m_encoding[i] < start || (m_encoding[i] - start) >= size)

>        return false;

> +

> +  /* For stepped encodings, check the full range of the series.  */

> +  if (nelts_per_pattern == 3)

> +    {

> +      element_type limit = input_nelts ();

> +

> +      /* The number of elements in each pattern beyond the first two

> +        that we checked above.  */

> +      unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;

> +      for (unsigned int i = 0; i < npatterns; ++i)

> +       {

> +         /* BASE1 has been checked but BASE2 hasn't.   */

> +         element_type base1 = m_encoding[i + npatterns];

> +         element_type base2 = m_encoding[i + base_nelts];

> +

> +         /* The step to add to get from BASE1 to each subsequent value.  */

> +         element_type step = clamp (base2 - base1);

> +

> +         /* STEP has no inherent sign, so a value near LIMIT can

> +            act as a negative step.  The series is in range if it

> +            is in range according to one of the two interpretations.

> +

> +            Since we're dealing with clamped values, ELEMENT_TYPE is

> +            wide enough for overflow not to be a problem.  */

> +         element_type headroom_down = base1 - start;

> +         element_type headroom_up = size - headroom_down - 1;

> +         if (headroom_up < step * step_nelts

> +             && headroom_down < (limit - step) * step_nelts)

> +           return false;

> +       }

> +    }

>    return true;

>  }

>

> @@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen

>  bool

>  tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)

>  {

> -  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));

> -  for (unsigned int i = 0; i < nelts; ++i)

> -    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))

> +  unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);

> +  for (unsigned int i = 0; i < encoded_nelts; ++i)

> +    if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))

>        return false;

>

> -  builder->reserve (nelts);

> -  for (unsigned int i = 0; i < nelts; ++i)

> -    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))

> -                        & (2 * nelts - 1));

> +  builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),

> +                      VECTOR_CST_NPATTERNS (cst),

> +                      VECTOR_CST_NELTS_PER_PATTERN (cst));

> +  for (unsigned int i = 0; i < encoded_nelts; ++i)

> +    builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));

>    return true;

>  }

>

> Index: gcc/optabs.c

> ===================================================================

> --- gcc/optabs.c        2017-12-09 22:47:27.881318099 +0000

> +++ gcc/optabs.c        2017-12-09 22:48:47.546825312 +0000

> @@ -5456,6 +5456,11 @@ expand_vec_perm_const (machine_mode mode

>    rtx_insn *last = get_last_insn ();

>

>    bool single_arg_p = rtx_equal_p (v0, v1);

> +  /* Always specify two input vectors here and leave the target to handle

> +     cases in which the inputs are equal.  Not all backends can cope with

> +     the single-input representation when testing for a double-input

> +     target instruction.  */

> +  vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));

>

>    /* See if this can be handled with a vec_shr.  We only do this if the

>       second vector is all zeroes.  */

> @@ -5468,7 +5473,7 @@ expand_vec_perm_const (machine_mode mode

>        && (shift_code != CODE_FOR_nothing

>           || shift_code_qi != CODE_FOR_nothing))

>      {

> -      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);

> +      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);

>        if (shift_amt)

>         {

>           struct expand_operand ops[3];

> @@ -5500,7 +5505,7 @@ expand_vec_perm_const (machine_mode mode

>        else

>         v1 = force_reg (mode, v1);

>

> -      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))

> +      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))

>         return target;

>      }

>

> @@ -5509,7 +5514,7 @@ expand_vec_perm_const (machine_mode mode

>    rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;

>    if (qimode != VOIDmode)

>      {

> -      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));

> +      qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));

>        target_qi = gen_reg_rtx (qimode);

>        v0_qi = gen_lowpart (qimode, v0);

>        v1_qi = gen_lowpart (qimode, v1);

> @@ -5536,7 +5541,7 @@ expand_vec_perm_const (machine_mode mode

>       REQUIRED_SEL_MODE is OK.  */

>    if (sel_mode != required_sel_mode)

>      {

> -      if (!selector_fits_mode_p (required_sel_mode, sel))

> +      if (!selector_fits_mode_p (required_sel_mode, indices))

>         {

>           delete_insns_since (last);

>           return NULL_RTX;

> @@ -5547,7 +5552,7 @@ expand_vec_perm_const (machine_mode mode

>    insn_code icode = direct_optab_handler (vec_perm_optab, mode);

>    if (icode != CODE_FOR_nothing)

>      {

> -      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);

> +      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);

>        rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);

>        if (tmp)

>         return tmp;

> @@ -5621,7 +5626,7 @@ expand_vec_perm_var (machine_mode mode,

>    gcc_assert (sel != NULL);

>

>    /* Broadcast the low byte each element into each of its bytes.  */

> -  vec_perm_builder const_sel (w);

> +  vec_perm_builder const_sel (w, w, 1);

>    for (i = 0; i < w; ++i)

>      {

>        int this_e = i / u * u;

> @@ -5848,7 +5853,7 @@ expand_mult_highpart (machine_mode mode,

>    expand_insn (optab_handler (tab2, mode), 3, eops);

>    m2 = gen_lowpart (mode, eops[0].value);

>

> -  auto_vec_perm_indices sel (nunits);

> +  vec_perm_builder sel (nunits, nunits, 1);

>    if (method == 2)

>      {

>        for (i = 0; i < nunits; ++i)

> Index: gcc/optabs-query.c

> ===================================================================

> --- gcc/optabs-query.c  2017-12-09 22:47:27.881318099 +0000

> +++ gcc/optabs-query.c  2017-12-09 22:48:47.545825268 +0000

> @@ -501,12 +501,13 @@ can_mult_highpart_p (machine_mode mode,

>        op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;

>        if (optab_handler (op, mode) != CODE_FOR_nothing)

>         {

> -         auto_vec_perm_indices sel (nunits);

> +         vec_perm_builder sel (nunits, nunits, 1);

>           for (i = 0; i < nunits; ++i)

>             sel.quick_push (!BYTES_BIG_ENDIAN

>                             + (i & ~1)

>                             + ((i & 1) ? nunits : 0));

> -         if (can_vec_perm_const_p (mode, sel))

> +         vec_perm_indices indices (sel, 2, nunits);

> +         if (can_vec_perm_const_p (mode, indices))

>             return 2;

>         }

>      }

> @@ -517,10 +518,11 @@ can_mult_highpart_p (machine_mode mode,

>        op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;

>        if (optab_handler (op, mode) != CODE_FOR_nothing)

>         {

> -         auto_vec_perm_indices sel (nunits);

> +         vec_perm_builder sel (nunits, nunits, 1);

>           for (i = 0; i < nunits; ++i)

>             sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));

> -         if (can_vec_perm_const_p (mode, sel))

> +         vec_perm_indices indices (sel, 2, nunits);

> +         if (can_vec_perm_const_p (mode, indices))

>             return 3;

>         }

>      }

> Index: gcc/fold-const.c

> ===================================================================

> --- gcc/fold-const.c    2017-12-09 22:47:27.881318099 +0000

> +++ gcc/fold-const.c    2017-12-09 22:48:47.545825268 +0000

> @@ -11217,7 +11217,7 @@ fold_ternary_loc (location_t loc, enum t

>             {

>               unsigned int nelts = VECTOR_CST_NELTS (arg0), i;

>               gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));

> -             auto_vec_perm_indices sel (nelts);

> +             vec_perm_builder sel (nelts, nelts, 1);

>               for (i = 0; i < nelts; i++)

>                 {

>                   tree val = VECTOR_CST_ELT (arg0, i);

> @@ -11228,7 +11228,8 @@ fold_ternary_loc (location_t loc, enum t

>                   else /* Currently unreachable.  */

>                     return NULL_TREE;

>                 }

> -             tree t = fold_vec_perm (type, arg1, arg2, sel);

> +             tree t = fold_vec_perm (type, arg1, arg2,

> +                                     vec_perm_indices (sel, 2, nelts));

>               if (t != NULL_TREE)

>                 return t;

>             }

> @@ -11558,8 +11559,8 @@ fold_ternary_loc (location_t loc, enum t

>           mask2 = 2 * nelts - 1;

>           mask = single_arg ? (nelts - 1) : mask2;

>           gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));

> -         auto_vec_perm_indices sel (nelts);

> -         auto_vec_perm_indices sel2 (nelts);

> +         vec_perm_builder sel (nelts, nelts, 1);

> +         vec_perm_builder sel2 (nelts, nelts, 1);

>           for (i = 0; i < nelts; i++)

>             {

>               tree val = VECTOR_CST_ELT (arg2, i);

> @@ -11604,12 +11605,13 @@ fold_ternary_loc (location_t loc, enum t

>               need_mask_canon = true;

>             }

>

> +         vec_perm_indices indices (sel, 2, nelts);

>           if ((TREE_CODE (op0) == VECTOR_CST

>                || TREE_CODE (op0) == CONSTRUCTOR)

>               && (TREE_CODE (op1) == VECTOR_CST

>                   || TREE_CODE (op1) == CONSTRUCTOR))

>             {

> -             tree t = fold_vec_perm (type, op0, op1, sel);

> +             tree t = fold_vec_perm (type, op0, op1, indices);

>               if (t != NULL_TREE)

>                 return t;

>             }

> @@ -11621,11 +11623,14 @@ fold_ternary_loc (location_t loc, enum t

>              argument permutation while still allowing an equivalent

>              2-argument version.  */

>           if (need_mask_canon && arg2 == op2

> -             && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)

> -             && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))

> +             && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)

> +             && can_vec_perm_const_p (TYPE_MODE (type),

> +                                      vec_perm_indices (sel2, 2, nelts),

> +                                      false))

>             {

>               need_mask_canon = need_mask_canon2;

> -             sel = sel2;

> +             sel.truncate (0);

> +             sel.splice (sel2);

>             }

>

>           if (need_mask_canon && arg2 == op2)

> Index: gcc/tree-ssa-forwprop.c

> ===================================================================

> --- gcc/tree-ssa-forwprop.c     2017-12-09 22:47:27.883318100 +0000

> +++ gcc/tree-ssa-forwprop.c     2017-12-09 22:48:47.546825312 +0000

> @@ -2019,7 +2019,7 @@ simplify_vector_constructor (gimple_stmt

>    elem_type = TREE_TYPE (type);

>    elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));

>

> -  auto_vec_perm_indices sel (nelts);

> +  vec_perm_builder sel (nelts, nelts, 1);

>    orig = NULL;

>    conv_code = ERROR_MARK;

>    maybe_ident = true;

> @@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt

>      {

>        tree mask_type;

>

> -      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))

> +      vec_perm_indices indices (sel, 1, nelts);

> +      if (!can_vec_perm_const_p (TYPE_MODE (type), indices))

>         return false;

>        mask_type

>         = build_vector_type (build_nonstandard_integer_type (elem_size, 1),

> Index: gcc/tree-vect-data-refs.c

> ===================================================================

> --- gcc/tree-vect-data-refs.c   2017-12-09 22:47:27.883318100 +0000

> +++ gcc/tree-vect-data-refs.c   2017-12-09 22:48:47.546825312 +0000

> @@ -4566,7 +4566,7 @@ vect_grouped_store_supported (tree vecty

>    if (VECTOR_MODE_P (mode))

>      {

>        unsigned int i, nelt = GET_MODE_NUNITS (mode);

> -      auto_vec_perm_indices sel (nelt);

> +      vec_perm_builder sel (nelt, nelt, 1);

>        sel.quick_grow (nelt);

>

>        if (count == 3)

> @@ -4574,6 +4574,7 @@ vect_grouped_store_supported (tree vecty

>           unsigned int j0 = 0, j1 = 0, j2 = 0;

>           unsigned int i, j;

>

> +         vec_perm_indices indices;

>           for (j = 0; j < 3; j++)

>             {

>               int nelt0 = ((3 - j) * nelt) % 3;

> @@ -4588,7 +4589,8 @@ vect_grouped_store_supported (tree vecty

>                   if (3 * i + nelt2 < nelt)

>                     sel[3 * i + nelt2] = 0;

>                 }

> -             if (!can_vec_perm_const_p (mode, sel))

> +             indices.new_vector (sel, 2, nelt);

> +             if (!can_vec_perm_const_p (mode, indices))

>                 {

>                   if (dump_enabled_p ())

>                     dump_printf (MSG_MISSED_OPTIMIZATION,

> @@ -4605,7 +4607,8 @@ vect_grouped_store_supported (tree vecty

>                   if (3 * i + nelt2 < nelt)

>                     sel[3 * i + nelt2] = nelt + j2++;

>                 }

> -             if (!can_vec_perm_const_p (mode, sel))

> +             indices.new_vector (sel, 2, nelt);

> +             if (!can_vec_perm_const_p (mode, indices))

>                 {

>                   if (dump_enabled_p ())

>                     dump_printf (MSG_MISSED_OPTIMIZATION,

> @@ -4625,11 +4628,13 @@ vect_grouped_store_supported (tree vecty

>               sel[i * 2] = i;

>               sel[i * 2 + 1] = i + nelt;

>             }

> -         if (can_vec_perm_const_p (mode, sel))

> +         vec_perm_indices indices (sel, 2, nelt);

> +         if (can_vec_perm_const_p (mode, indices))

>             {

>               for (i = 0; i < nelt; i++)

>                 sel[i] += nelt / 2;

> -             if (can_vec_perm_const_p (mode, sel))

> +             indices.new_vector (sel, 2, nelt);

> +             if (can_vec_perm_const_p (mode, indices))

>                 return true;

>             }

>         }

> @@ -4731,7 +4736,7 @@ vect_permute_store_chain (vec<tree> dr_c

>    unsigned int i, n, log_length = exact_log2 (length);

>    unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);

>

> -  auto_vec_perm_indices sel (nelt);

> +  vec_perm_builder sel (nelt, nelt, 1);

>    sel.quick_grow (nelt);

>

>    result_chain->quick_grow (length);

> @@ -4742,6 +4747,7 @@ vect_permute_store_chain (vec<tree> dr_c

>      {

>        unsigned int j0 = 0, j1 = 0, j2 = 0;

>

> +      vec_perm_indices indices;

>        for (j = 0; j < 3; j++)

>          {

>           int nelt0 = ((3 - j) * nelt) % 3;

> @@ -4757,7 +4763,8 @@ vect_permute_store_chain (vec<tree> dr_c

>               if (3 * i + nelt2 < nelt)

>                 sel[3 * i + nelt2] = 0;

>             }

> -         perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);

> +         indices.new_vector (sel, 2, nelt);

> +         perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);

>

>           for (i = 0; i < nelt; i++)

>             {

> @@ -4768,7 +4775,8 @@ vect_permute_store_chain (vec<tree> dr_c

>               if (3 * i + nelt2 < nelt)

>                 sel[3 * i + nelt2] = nelt + j2++;

>             }

> -         perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);

> +         indices.new_vector (sel, 2, nelt);

> +         perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);

>

>           vect1 = dr_chain[0];

>           vect2 = dr_chain[1];

> @@ -4805,11 +4813,13 @@ vect_permute_store_chain (vec<tree> dr_c

>           sel[i * 2] = i;

>           sel[i * 2 + 1] = i + nelt;

>         }

> -       perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);

> +       vec_perm_indices indices (sel, 2, nelt);

> +       perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);

>

>         for (i = 0; i < nelt; i++)

>           sel[i] += nelt / 2;

> -       perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);

> +       indices.new_vector (sel, 2, nelt);

> +       perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);

>

>         for (i = 0, n = log_length; i < n; i++)

>           {

> @@ -5154,11 +5164,12 @@ vect_grouped_load_supported (tree vectyp

>    if (VECTOR_MODE_P (mode))

>      {

>        unsigned int i, j, nelt = GET_MODE_NUNITS (mode);

> -      auto_vec_perm_indices sel (nelt);

> +      vec_perm_builder sel (nelt, nelt, 1);

>        sel.quick_grow (nelt);

>

>        if (count == 3)

>         {

> +         vec_perm_indices indices;

>           unsigned int k;

>           for (k = 0; k < 3; k++)

>             {

> @@ -5167,7 +5178,8 @@ vect_grouped_load_supported (tree vectyp

>                   sel[i] = 3 * i + k;

>                 else

>                   sel[i] = 0;

> -             if (!can_vec_perm_const_p (mode, sel))

> +             indices.new_vector (sel, 2, nelt);

> +             if (!can_vec_perm_const_p (mode, indices))

>                 {

>                   if (dump_enabled_p ())

>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> @@ -5180,7 +5192,8 @@ vect_grouped_load_supported (tree vectyp

>                   sel[i] = i;

>                 else

>                   sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);

> -             if (!can_vec_perm_const_p (mode, sel))

> +             indices.new_vector (sel, 2, nelt);

> +             if (!can_vec_perm_const_p (mode, indices))

>                 {

>                   if (dump_enabled_p ())

>                     dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> @@ -5195,13 +5208,16 @@ vect_grouped_load_supported (tree vectyp

>         {

>           /* If length is not equal to 3 then only power of 2 is supported.  */

>           gcc_assert (pow2p_hwi (count));

> +

>           for (i = 0; i < nelt; i++)

>             sel[i] = i * 2;

> -         if (can_vec_perm_const_p (mode, sel))

> +         vec_perm_indices indices (sel, 2, nelt);

> +         if (can_vec_perm_const_p (mode, indices))

>             {

>               for (i = 0; i < nelt; i++)

>                 sel[i] = i * 2 + 1;

> -             if (can_vec_perm_const_p (mode, sel))

> +             indices.new_vector (sel, 2, nelt);

> +             if (can_vec_perm_const_p (mode, indices))

>                 return true;

>             }

>          }

> @@ -5316,7 +5332,7 @@ vect_permute_load_chain (vec<tree> dr_ch

>    unsigned int i, j, log_length = exact_log2 (length);

>    unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);

>

> -  auto_vec_perm_indices sel (nelt);

> +  vec_perm_builder sel (nelt, nelt, 1);

>    sel.quick_grow (nelt);

>

>    result_chain->quick_grow (length);

> @@ -5327,6 +5343,7 @@ vect_permute_load_chain (vec<tree> dr_ch

>      {

>        unsigned int k;

>

> +      vec_perm_indices indices;

>        for (k = 0; k < 3; k++)

>         {

>           for (i = 0; i < nelt; i++)

> @@ -5334,15 +5351,16 @@ vect_permute_load_chain (vec<tree> dr_ch

>               sel[i] = 3 * i + k;

>             else

>               sel[i] = 0;

> -         perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);

> +         indices.new_vector (sel, 2, nelt);

> +         perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);

>

>           for (i = 0, j = 0; i < nelt; i++)

>             if (3 * i + k < 2 * nelt)

>               sel[i] = i;

>             else

>               sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);

> -

> -         perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);

> +         indices.new_vector (sel, 2, nelt);

> +         perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);

>

>           first_vect = dr_chain[0];

>           second_vect = dr_chain[1];

> @@ -5374,11 +5392,13 @@ vect_permute_load_chain (vec<tree> dr_ch

>

>        for (i = 0; i < nelt; ++i)

>         sel[i] = i * 2;

> -      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);

> +      vec_perm_indices indices (sel, 2, nelt);

> +      perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);

>

>        for (i = 0; i < nelt; ++i)

>         sel[i] = i * 2 + 1;

> -      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);

> +      indices.new_vector (sel, 2, nelt);

> +      perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);

>

>        for (i = 0; i < log_length; i++)

>         {

> @@ -5514,7 +5534,7 @@ vect_shift_permute_load_chain (vec<tree>

>    stmt_vec_info stmt_info = vinfo_for_stmt (stmt);

>    loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);

>

> -  auto_vec_perm_indices sel (nelt);

> +  vec_perm_builder sel (nelt, nelt, 1);

>    sel.quick_grow (nelt);

>

>    result_chain->quick_grow (length);

> @@ -5528,7 +5548,8 @@ vect_shift_permute_load_chain (vec<tree>

>         sel[i] = i * 2;

>        for (i = 0; i < nelt / 2; ++i)

>         sel[nelt / 2 + i] = i * 2 + 1;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      vec_perm_indices indices (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> @@ -5536,13 +5557,14 @@ vect_shift_permute_load_chain (vec<tree>

>                               supported by target\n");

>           return false;

>         }

> -      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);

> +      perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);

>

>        for (i = 0; i < nelt / 2; ++i)

>         sel[i] = i * 2 + 1;

>        for (i = 0; i < nelt / 2; ++i)

>         sel[nelt / 2 + i] = i * 2;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> @@ -5550,20 +5572,21 @@ vect_shift_permute_load_chain (vec<tree>

>                               supported by target\n");

>           return false;

>         }

> -      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);

> +      perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);

>

>        /* Generating permutation constant to shift all elements.

>          For vector length 8 it is {4 5 6 7 8 9 10 11}.  */

>        for (i = 0; i < nelt; i++)

>         sel[i] = nelt / 2 + i;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "shift permutation is not supported by target\n");

>           return false;

>         }

> -      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        /* Generating permutation constant to select vector from 2.

>          For vector length 8 it is {0 1 2 3 12 13 14 15}.  */

> @@ -5571,14 +5594,15 @@ vect_shift_permute_load_chain (vec<tree>

>         sel[i] = i;

>        for (i = nelt / 2; i < nelt; i++)

>         sel[i] = nelt + i;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "select is not supported by target\n");

>           return false;

>         }

> -      select_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      select_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        for (i = 0; i < log_length; i++)

>         {

> @@ -5634,7 +5658,8 @@ vect_shift_permute_load_chain (vec<tree>

>           sel[i] = 3 * k + (l % 3);

>           k++;

>         }

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      vec_perm_indices indices (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

> @@ -5642,59 +5667,63 @@ vect_shift_permute_load_chain (vec<tree>

>                               supported by target\n");

>           return false;

>         }

> -      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      perm3_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        /* Generating permutation constant to shift all elements.

>          For vector length 8 it is {6 7 8 9 10 11 12 13}.  */

>        for (i = 0; i < nelt; i++)

>         sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "shift permutation is not supported by target\n");

>           return false;

>         }

> -      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        /* Generating permutation constant to shift all elements.

>          For vector length 8 it is {5 6 7 8 9 10 11 12}.  */

>        for (i = 0; i < nelt; i++)

>         sel[i] = 2 * (nelt / 3) + 1 + i;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "shift permutation is not supported by target\n");

>           return false;

>         }

> -      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      shift2_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        /* Generating permutation constant to shift all elements.

>          For vector length 8 it is {3 4 5 6 7 8 9 10}.  */

>        for (i = 0; i < nelt; i++)

>         sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "shift permutation is not supported by target\n");

>           return false;

>         }

> -      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      shift3_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        /* Generating permutation constant to shift all elements.

>          For vector length 8 it is {5 6 7 8 9 10 11 12}.  */

>        for (i = 0; i < nelt; i++)

>         sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           if (dump_enabled_p ())

>             dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,

>                              "shift permutation is not supported by target\n");

>           return false;

>         }

> -      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);

> +      shift4_mask = vect_gen_perm_mask_checked (vectype, indices);

>

>        for (k = 0; k < 3; k++)

>         {

> Index: gcc/tree-vect-slp.c

> ===================================================================

> --- gcc/tree-vect-slp.c 2017-12-09 22:47:27.884318101 +0000

> +++ gcc/tree-vect-slp.c 2017-12-09 22:48:47.547825355 +0000

> @@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>        && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)

>      {

>        unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);

> -      auto_vec_perm_indices sel (count);

> +      vec_perm_builder sel (count, count, 1);

>        for (i = 0; i < count; ++i)

>         {

>           unsigned int elt = i;

> @@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,

>             elt += count;

>           sel.quick_push (elt);

>         }

> -      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +      vec_perm_indices indices (sel, 2, count);

> +      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>         {

>           for (i = 0; i < group_size; ++i)

>             if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)

> @@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n

>      (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);

>    mask_type = get_vectype_for_scalar_type (mask_element_type);

>    nunits = TYPE_VECTOR_SUBPARTS (vectype);

> -  auto_vec_perm_indices mask (nunits);

> +  vec_perm_builder mask (nunits, nunits, 1);

>    mask.quick_grow (nunits);

> +  vec_perm_indices indices;

>

>    /* Initialize the vect stmts of NODE to properly insert the generated

>       stmts later.  */

> @@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n

>             noop_p = false;

>           mask[index++] = mask_element;

>

> -         if (index == nunits)

> +         if (index == nunits && !noop_p)

>             {

> -             if (! noop_p

> -                 && ! can_vec_perm_const_p (mode, mask))

> +             indices.new_vector (mask, 2, nunits);

> +             if (!can_vec_perm_const_p (mode, indices))

>                 {

>                   if (dump_enabled_p ())

>                     {

> @@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n

>                                        vect_location,

>                                        "unsupported vect permute { ");

>                       for (i = 0; i < nunits; ++i)

> -                       dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);

> +                       dump_printf (MSG_MISSED_OPTIMIZATION,

> +                                    HOST_WIDE_INT_PRINT_DEC " ", mask[i]);

>                       dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");

>                     }

>                   gcc_assert (analyze_only);

>                   return false;

>                 }

>

> -             if (! noop_p)

> -               ++*n_perms;

> +             ++*n_perms;

> +           }

>

> +         if (index == nunits)

> +           {

>               if (!analyze_only)

>                 {

>                   tree mask_vec = NULL_TREE;

> @@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod

>        enum tree_code code0 = gimple_assign_rhs_code (stmt);

>        enum tree_code ocode = ERROR_MARK;

>        gimple *ostmt;

> -      auto_vec_perm_indices mask (group_size);

> +      vec_perm_builder mask (group_size, group_size, 1);

>        FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)

>         if (gimple_assign_rhs_code (ostmt) != code0)

>           {

> Index: gcc/tree-vect-stmts.c

> ===================================================================

> --- gcc/tree-vect-stmts.c       2017-12-09 22:47:27.885318101 +0000

> +++ gcc/tree-vect-stmts.c       2017-12-09 22:48:47.548825399 +0000

> @@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype)

>

>    nunits = TYPE_VECTOR_SUBPARTS (vectype);

>

> -  auto_vec_perm_indices sel (nunits);

> +  vec_perm_builder sel (nunits, nunits, 1);

>    for (i = 0; i < nunits; ++i)

>      sel.quick_push (nunits - 1 - i);

>

> -  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))

> +  vec_perm_indices indices (sel, 1, nunits);

> +  if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))

>      return NULL_TREE;

> -  return vect_gen_perm_mask_checked (vectype, sel);

> +  return vect_gen_perm_mask_checked (vectype, indices);

>  }

>

>  /* A subroutine of get_load_store_type, with a subset of the same

> @@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st

>         {

>           modifier = WIDEN;

>

> -         auto_vec_perm_indices sel (gather_off_nunits);

> +         vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);

>           for (i = 0; i < gather_off_nunits; ++i)

>             sel.quick_push (i | nunits);

>

> -         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);

> +         vec_perm_indices indices (sel, 1, gather_off_nunits);

> +         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,

> +                                                 indices);

>         }

>        else if (nunits == gather_off_nunits * 2)

>         {

>           modifier = NARROW;

>

> -         auto_vec_perm_indices sel (nunits);

> +         vec_perm_builder sel (nunits, nunits, 1);

>           sel.quick_grow (nunits);

>           for (i = 0; i < nunits; ++i)

>             sel[i] = i < gather_off_nunits

>                      ? i : i + nunits - gather_off_nunits;

> +         vec_perm_indices indices (sel, 2, nunits);

> +         perm_mask = vect_gen_perm_mask_checked (vectype, indices);

>

> -         perm_mask = vect_gen_perm_mask_checked (vectype, sel);

>           ncopies *= 2;

> +

>           for (i = 0; i < nunits; ++i)

>             sel[i] = i | gather_off_nunits;

> -         mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);

> +         indices.new_vector (sel, 2, gather_off_nunits);

> +         mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);

>         }

>        else

>         gcc_unreachable ();

> @@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple

>    unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);

>    unsigned word_bytes = num_bytes / nunits;

>

> -  auto_vec_perm_indices elts (num_bytes);

> +  vec_perm_builder elts (num_bytes, num_bytes, 1);

>    for (unsigned i = 0; i < nunits; ++i)

>      for (unsigned j = 0; j < word_bytes; ++j)

>        elts.quick_push ((i + 1) * word_bytes - j - 1);

>

> -  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))

> +  vec_perm_indices indices (elts, 1, num_bytes);

> +  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))

>      return false;

>

>    if (! vec_stmt)

> @@ -5809,22 +5816,25 @@ vectorizable_store (gimple *stmt, gimple

>         {

>           modifier = WIDEN;

>

> -         auto_vec_perm_indices sel (scatter_off_nunits);

> +         vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);

>           for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)

>             sel.quick_push (i | nunits);

>

> -         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);

> +         vec_perm_indices indices (sel, 1, scatter_off_nunits);

> +         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,

> +                                                 indices);

>           gcc_assert (perm_mask != NULL_TREE);

>         }

>        else if (nunits == (unsigned int) scatter_off_nunits * 2)

>         {

>           modifier = NARROW;

>

> -         auto_vec_perm_indices sel (nunits);

> +         vec_perm_builder sel (nunits, nunits, 1);

>           for (i = 0; i < (unsigned int) nunits; ++i)

>             sel.quick_push (i | scatter_off_nunits);

>

> -         perm_mask = vect_gen_perm_mask_checked (vectype, sel);

> +         vec_perm_indices indices (sel, 2, nunits);

> +         perm_mask = vect_gen_perm_mask_checked (vectype, indices);

>           gcc_assert (perm_mask != NULL_TREE);

>           ncopies *= 2;

>         }

> @@ -6845,22 +6855,25 @@ vectorizable_load (gimple *stmt, gimple_

>         {

>           modifier = WIDEN;

>

> -         auto_vec_perm_indices sel (gather_off_nunits);

> +         vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);

>           for (i = 0; i < gather_off_nunits; ++i)

>             sel.quick_push (i | nunits);

>

> -         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);

> +         vec_perm_indices indices (sel, 1, gather_off_nunits);

> +         perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,

> +                                                 indices);

>         }

>        else if (nunits == gather_off_nunits * 2)

>         {

>           modifier = NARROW;

>

> -         auto_vec_perm_indices sel (nunits);

> +         vec_perm_builder sel (nunits, nunits, 1);

>           for (i = 0; i < nunits; ++i)

>             sel.quick_push (i < gather_off_nunits

>                             ? i : i + nunits - gather_off_nunits);

>

> -         perm_mask = vect_gen_perm_mask_checked (vectype, sel);

> +         vec_perm_indices indices (sel, 2, nunits);

> +         perm_mask = vect_gen_perm_mask_checked (vectype, indices);

>           ncopies *= 2;

>         }

>        else

> Index: gcc/tree-vect-generic.c

> ===================================================================

> --- gcc/tree-vect-generic.c     2017-12-09 22:47:27.883318100 +0000

> +++ gcc/tree-vect-generic.c     2017-12-09 22:48:47.547825355 +0000

> @@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs

>         mask = gimple_assign_rhs1 (def_stmt);

>      }

>

> -  if (TREE_CODE (mask) == VECTOR_CST)

> -    {

> -      auto_vec_perm_indices sel_int (elements);

> -

> -      for (i = 0; i < elements; ++i)

> -       sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))

> -                           & (2 * elements - 1));

> +  vec_perm_builder sel_int;

>

> -      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))

> +  if (TREE_CODE (mask) == VECTOR_CST

> +      && tree_to_vec_perm_builder (&sel_int, mask))

> +    {

> +      vec_perm_indices indices (sel_int, 2, elements);

> +      if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))

>         {

>           gimple_assign_set_rhs3 (stmt, mask);

>           update_stmt (stmt);

> @@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs

>           != CODE_FOR_nothing

>           && TREE_CODE (vec1) == VECTOR_CST

>           && initializer_zerop (vec1)

> -         && sel_int[0]

> -         && sel_int[0] < elements)

> +         && indices[0]

> +         && indices[0] < elements)

>         {

>           for (i = 1; i < elements; ++i)

>             {

> -             unsigned int expected = i + sel_int[0];

> +             unsigned int expected = i + indices[0];

>               /* Indices into the second vector are all equivalent.  */

> -             if (MIN (elements, (unsigned) sel_int[i])

> +             if (MIN (elements, (unsigned) indices[i])

>                   != MIN (elements, expected))

>                 break;

>             }

> Index: gcc/tree-vect-loop.c

> ===================================================================

> --- gcc/tree-vect-loop.c        2017-12-09 22:47:27.884318101 +0000

> +++ gcc/tree-vect-loop.c        2017-12-09 22:48:47.547825355 +0000

> @@ -3714,12 +3714,11 @@ vect_estimate_min_profitable_iters (loop

>     vector elements (not bits) for a vector with NELT elements.  */

>  static void

>  calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,

> -                             vec_perm_indices *sel)

> +                             vec_perm_builder *sel)

>  {

> -  unsigned int i;

> -

> -  for (i = 0; i < nelt; i++)

> -    sel->quick_push ((i + offset) & (2 * nelt - 1));

> +  sel->new_vector (nelt, nelt, 1);

> +  for (unsigned int i = 0; i < nelt; i++)

> +    sel->quick_push (i + offset);

>  }

>

>  /* Checks whether the target supports whole-vector shifts for vectors of mode

> @@ -3732,13 +3731,13 @@ have_whole_vector_shift (machine_mode mo

>      return true;

>

>    unsigned int i, nelt = GET_MODE_NUNITS (mode);

> -  auto_vec_perm_indices sel (nelt);

> -

> +  vec_perm_builder sel;

> +  vec_perm_indices indices;

>    for (i = nelt/2; i >= 1; i/=2)

>      {

> -      sel.truncate (0);

>        calc_vec_perm_mask_for_shift (i, nelt, &sel);

> -      if (!can_vec_perm_const_p (mode, sel, false))

> +      indices.new_vector (sel, 2, nelt);

> +      if (!can_vec_perm_const_p (mode, indices, false))

>         return false;

>      }

>    return true;

> @@ -5028,7 +5027,8 @@ vect_create_epilog_for_reduction (vec<tr

>        if (reduce_with_shift && !slp_reduc)

>          {

>            int nelements = vec_size_in_bits / element_bitsize;

> -          auto_vec_perm_indices sel (nelements);

> +         vec_perm_builder sel;

> +         vec_perm_indices indices;

>

>            int elt_offset;

>

> @@ -5052,9 +5052,9 @@ vect_create_epilog_for_reduction (vec<tr

>                 elt_offset >= 1;

>                 elt_offset /= 2)

>              {

> -             sel.truncate (0);

>               calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);

> -             tree mask = vect_gen_perm_mask_any (vectype, sel);

> +             indices.new_vector (sel, 2, nelements);

> +             tree mask = vect_gen_perm_mask_any (vectype, indices);

>               epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,

>                                                  new_temp, zero_vec, mask);

>                new_name = make_ssa_name (vec_dest, epilog_stmt);

> Index: gcc/config/aarch64/aarch64.c

> ===================================================================

> --- gcc/config/aarch64/aarch64.c        2017-12-09 22:47:27.856318084 +0000

> +++ gcc/config/aarch64/aarch64.c        2017-12-09 22:48:47.535824832 +0000

> @@ -13208,7 +13208,7 @@ #define MAX_VECT_LEN 16

>  struct expand_vec_perm_d

>  {

>    rtx target, op0, op1;

> -  auto_vec_perm_indices perm;

> +  vec_perm_indices perm;

>    machine_mode vmode;

>    bool one_vector_p;

>    bool testing_p;

> @@ -13598,10 +13598,7 @@ aarch64_expand_vec_perm_const_1 (struct

>    unsigned int nelt = d->perm.length ();

>    if (d->perm[0] >= nelt)

>      {

> -      gcc_assert (nelt == (nelt & -nelt));

> -      for (unsigned int i = 0; i < nelt; ++i)

> -       d->perm[i] ^= nelt; /* Keep the same index, but in the other vector.  */

> -

> +      d->perm.rotate_inputs (1);

>        std::swap (d->op0, d->op1);

>      }

>

> @@ -13641,12 +13638,10 @@ aarch64_vectorize_vec_perm_const (machin

>

>    /* Calculate whether all elements are in one vector.  */

>    unsigned int nelt = sel.length ();

> -  d.perm.reserve (nelt);

>    for (i = which = 0; i < nelt; ++i)

>      {

>        unsigned int ei = sel[i] & (2 * nelt - 1);

>        which |= (ei < nelt ? 1 : 2);

> -      d.perm.quick_push (ei);

>      }

>

>    switch (which)

> @@ -13665,8 +13660,6 @@ aarch64_vectorize_vec_perm_const (machin

>          input vector.  */

>        /* Fall Through.  */

>      case 2:

> -      for (i = 0; i < nelt; ++i)

> -       d.perm[i] &= nelt - 1;

>        d.op0 = op1;

>        d.one_vector_p = true;

>        break;

> @@ -13677,6 +13670,8 @@ aarch64_vectorize_vec_perm_const (machin

>        break;

>      }

>

> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);

> +

>    if (!d.testing_p)

>      return aarch64_expand_vec_perm_const_1 (&d);

>

> Index: gcc/config/arm/arm.c

> ===================================================================

> --- gcc/config/arm/arm.c        2017-12-09 22:47:27.858318085 +0000

> +++ gcc/config/arm/arm.c        2017-12-09 22:48:47.538824963 +0000

> @@ -28852,7 +28852,7 @@ #define MAX_VECT_LEN 16

>  struct expand_vec_perm_d

>  {

>    rtx target, op0, op1;

> -  auto_vec_perm_indices perm;

> +  vec_perm_indices perm;

>    machine_mode vmode;

>    bool one_vector_p;

>    bool testing_p;

> @@ -29360,9 +29360,7 @@ arm_expand_vec_perm_const_1 (struct expa

>    unsigned int nelt = d->perm.length ();

>    if (d->perm[0] >= nelt)

>      {

> -      for (unsigned int i = 0; i < nelt; ++i)

> -       d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);

> -

> +      d->perm.rotate_inputs (1);

>        std::swap (d->op0, d->op1);

>      }

>

> @@ -29402,12 +29400,10 @@ arm_vectorize_vec_perm_const (machine_mo

>    d.testing_p = !target;

>

>    nelt = GET_MODE_NUNITS (d.vmode);

> -  d.perm.reserve (nelt);

>    for (i = which = 0; i < nelt; ++i)

>      {

>        int ei = sel[i] & (2 * nelt - 1);

>        which |= (ei < nelt ? 1 : 2);

> -      d.perm.quick_push (ei);

>      }

>

>    switch (which)

> @@ -29426,8 +29422,6 @@ arm_vectorize_vec_perm_const (machine_mo

>          input vector.  */

>        /* FALLTHRU */

>      case 2:

> -      for (i = 0; i < nelt; ++i)

> -        d.perm[i] &= nelt - 1;

>        d.op0 = op1;

>        d.one_vector_p = true;

>        break;

> @@ -29438,6 +29432,8 @@ arm_vectorize_vec_perm_const (machine_mo

>        break;

>      }

>

> +  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);

> +

>    if (d.testing_p)

>      return arm_expand_vec_perm_const_1 (&d);

>

> Index: gcc/config/powerpcspe/powerpcspe.c

> ===================================================================

> --- gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:47:27.871318093 +0000

> +++ gcc/config/powerpcspe/powerpcspe.c  2017-12-09 22:48:47.541825094 +0000

> @@ -38780,7 +38780,7 @@ rs6000_expand_extract_even (rtx target,

>  {

>    machine_mode vmode = GET_MODE (target);

>    unsigned i, nelt = GET_MODE_NUNITS (vmode);

> -  vec_perm_builder perm (nelt);

> +  vec_perm_builder perm (nelt, nelt, 1);

>

>    for (i = 0; i < nelt; i++)

>      perm.quick_push (i * 2);

> @@ -38795,7 +38795,7 @@ rs6000_expand_interleave (rtx target, rt

>  {

>    machine_mode vmode = GET_MODE (target);

>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);

> -  vec_perm_builder perm (nelt);

> +  vec_perm_builder perm (nelt, nelt, 1);

>

>    high = (highp ? 0 : nelt / 2);

>    for (i = 0; i < nelt / 2; i++)

> Index: gcc/config/rs6000/rs6000.c

> ===================================================================

> --- gcc/config/rs6000/rs6000.c  2017-12-09 22:47:27.874318095 +0000

> +++ gcc/config/rs6000/rs6000.c  2017-12-09 22:48:47.544825224 +0000

> @@ -36017,7 +36017,7 @@ rs6000_expand_extract_even (rtx target,

>  {

>    machine_mode vmode = GET_MODE (target);

>    unsigned i, nelt = GET_MODE_NUNITS (vmode);

> -  vec_perm_builder perm (nelt);

> +  vec_perm_builder perm (nelt, nelt, 1);

>

>    for (i = 0; i < nelt; i++)

>      perm.quick_push (i * 2);

> @@ -36032,7 +36032,7 @@ rs6000_expand_interleave (rtx target, rt

>  {

>    machine_mode vmode = GET_MODE (target);

>    unsigned i, high, nelt = GET_MODE_NUNITS (vmode);

> -  vec_perm_builder perm (nelt);

> +  vec_perm_builder perm (nelt, nelt, 1);

>

>    high = (highp ? 0 : nelt / 2);

>    for (i = 0; i < nelt / 2; i++)
Richard Sandiford Dec. 12, 2017, 3:46 p.m. | #2
Richard Biener <richard.guenther@gmail.com> writes:
> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> This patch changes vec_perm_indices from a plain vec<> to a class

>> that stores a canonicalised permutation, using the same encoding

>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries

>> information about the number of vectors being permuted (currently

>> always 1 or 2) and the number of elements in each input vector.

>

> Before I dive into  the C++ details can you explain why it needs this

> info and how it encodes it for variable-length vectors?  To interleave

> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we

> can directly encode N here, can we?  extract even/odd should just

> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute

> one or two vectors (the one vector case just has two times the same

> vector) or how many elements each of the vectors (or the result) has.


One of the later patches switches the element types to HOST_WIDE_INT,
so that we can represent all ssizetypes.  Then there's a poly_int
patch (not yet posted) to make that poly_int64, so that we can
represent the N even for variable-length vectors.

The class needs to know the number of elements because that affects
the canonical representation.  E.g. extract even on fixed-length
vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },
which we can't encode as a simple series.  Interleave low with both
inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and
variable-length vectors.

Also, operator[] is supposed to return an in-range selector even if
the selector element is only implicitly encoded.  So we need to know
the number of input elements there.

Separating the number of input elements into the number of inputs
and the number of elements per input isn't really necessary, but made
it easier to provide routines for testing whether all selected
elements come from a particular input, and for rotating the selector
by a whole number of inputs.

Thanks,
Richard
Richard Biener Dec. 14, 2017, 10:36 a.m. | #3
On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> This patch changes vec_perm_indices from a plain vec<> to a class

>>> that stores a canonicalised permutation, using the same encoding

>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries

>>> information about the number of vectors being permuted (currently

>>> always 1 or 2) and the number of elements in each input vector.

>>

>> Before I dive into  the C++ details can you explain why it needs this

>> info and how it encodes it for variable-length vectors?  To interleave

>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we

>> can directly encode N here, can we?  extract even/odd should just

>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute

>> one or two vectors (the one vector case just has two times the same

>> vector) or how many elements each of the vectors (or the result) has.

>

> One of the later patches switches the element types to HOST_WIDE_INT,

> so that we can represent all ssizetypes.  Then there's a poly_int

> patch (not yet posted) to make that poly_int64, so that we can

> represent the N even for variable-length vectors.

>

> The class needs to know the number of elements because that affects

> the canonical representation.  E.g. extract even on fixed-length

> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },

> which we can't encode as a simple series.  Interleave low with both

> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and

> variable-length vectors.


Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector
are referencing concat'ed input vectors.  So yes, for two same vectors
that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why
that should
be the canonical form?

> Also, operator[] is supposed to return an in-range selector even if

> the selector element is only implicitly encoded.  So we need to know

> the number of input elements there.

>

> Separating the number of input elements into the number of inputs

> and the number of elements per input isn't really necessary, but made

> it easier to provide routines for testing whether all selected

> elements come from a particular input, and for rotating the selector

> by a whole number of inputs.

>

> Thanks,

> Richard
Richard Sandiford Dec. 20, 2017, 1:48 p.m. | #4
Richard Biener <richard.guenther@gmail.com> writes:
> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Richard Biener <richard.guenther@gmail.com> writes:

>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> This patch changes vec_perm_indices from a plain vec<> to a class

>>>> that stores a canonicalised permutation, using the same encoding

>>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries

>>>> information about the number of vectors being permuted (currently

>>>> always 1 or 2) and the number of elements in each input vector.

>>>

>>> Before I dive into  the C++ details can you explain why it needs this

>>> info and how it encodes it for variable-length vectors?  To interleave

>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we

>>> can directly encode N here, can we?  extract even/odd should just

>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute

>>> one or two vectors (the one vector case just has two times the same

>>> vector) or how many elements each of the vectors (or the result) has.

>>

>> One of the later patches switches the element types to HOST_WIDE_INT,

>> so that we can represent all ssizetypes.  Then there's a poly_int

>> patch (not yet posted) to make that poly_int64, so that we can

>> represent the N even for variable-length vectors.

>>

>> The class needs to know the number of elements because that affects

>> the canonical representation.  E.g. extract even on fixed-length

>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },

>> which we can't encode as a simple series.  Interleave low with both

>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and

>> variable-length vectors.

>

> Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector

> are referencing concat'ed input vectors.  So yes, for two same vectors

> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why

> that should

> be the canonical form?


Current practice is to use the single-input form where possible,
if both inputs are the same (see e.g. the VEC_PERM_EXPR handling
in fold-const.c).  It means that things like:

    _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
    _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
    _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;

get folded to the same sequence, and so can be CSEd.

We could instead convert the single-input form to use the two-input
selector, but that would be harder.  The advantage of treating the
single-input form as canonical is that it works even for irregular
permutes.

Thanks,
Richard

>> Also, operator[] is supposed to return an in-range selector even if

>> the selector element is only implicitly encoded.  So we need to know

>> the number of input elements there.

>>

>> Separating the number of input elements into the number of inputs

>> and the number of elements per input isn't really necessary, but made

>> it easier to provide routines for testing whether all selected

>> elements come from a particular input, and for rotating the selector

>> by a whole number of inputs.

>>

>> Thanks,

>> Richard
Richard Biener Jan. 2, 2018, 1:15 p.m. | #5
On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford
<richard.sandiford@linaro.org> wrote:
> Richard Biener <richard.guenther@gmail.com> writes:

>> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford

>> <richard.sandiford@linaro.org> wrote:

>>> Richard Biener <richard.guenther@gmail.com> writes:

>>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford

>>>> <richard.sandiford@linaro.org> wrote:

>>>>> This patch changes vec_perm_indices from a plain vec<> to a class

>>>>> that stores a canonicalised permutation, using the same encoding

>>>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries

>>>>> information about the number of vectors being permuted (currently

>>>>> always 1 or 2) and the number of elements in each input vector.

>>>>

>>>> Before I dive into  the C++ details can you explain why it needs this

>>>> info and how it encodes it for variable-length vectors?  To interleave

>>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we

>>>> can directly encode N here, can we?  extract even/odd should just

>>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute

>>>> one or two vectors (the one vector case just has two times the same

>>>> vector) or how many elements each of the vectors (or the result) has.

>>>

>>> One of the later patches switches the element types to HOST_WIDE_INT,

>>> so that we can represent all ssizetypes.  Then there's a poly_int

>>> patch (not yet posted) to make that poly_int64, so that we can

>>> represent the N even for variable-length vectors.

>>>

>>> The class needs to know the number of elements because that affects

>>> the canonical representation.  E.g. extract even on fixed-length

>>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },

>>> which we can't encode as a simple series.  Interleave low with both

>>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and

>>> variable-length vectors.

>>

>> Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector

>> are referencing concat'ed input vectors.  So yes, for two same vectors

>> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why

>> that should

>> be the canonical form?

>

> Current practice is to use the single-input form where possible,

> if both inputs are the same (see e.g. the VEC_PERM_EXPR handling

> in fold-const.c).  It means that things like:

>

>     _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;

>     _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;

>     _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;

>

> get folded to the same sequence, and so can be CSEd.

>

> We could instead convert the single-input form to use the two-input

> selector, but that would be harder.  The advantage of treating the

> single-input form as canonical is that it works even for irregular

> permutes.


Ok, I see.  Maybe adding a comment along this helps.

Thanks,
Richard.

> Thanks,

> Richard

>

>>> Also, operator[] is supposed to return an in-range selector even if

>>> the selector element is only implicitly encoded.  So we need to know

>>> the number of input elements there.

>>>

>>> Separating the number of input elements into the number of inputs

>>> and the number of elements per input isn't really necessary, but made

>>> it easier to provide routines for testing whether all selected

>>> elements come from a particular input, and for rotating the selector

>>> by a whole number of inputs.

>>>

>>> Thanks,

>>> Richard
Richard Sandiford Jan. 2, 2018, 6:30 p.m. | #6
Richard Biener <richard.guenther@gmail.com> writes:
> On Wed, Dec 20, 2017 at 2:48 PM, Richard Sandiford

> <richard.sandiford@linaro.org> wrote:

>> Richard Biener <richard.guenther@gmail.com> writes:

>>> On Tue, Dec 12, 2017 at 4:46 PM, Richard Sandiford

>>> <richard.sandiford@linaro.org> wrote:

>>>> Richard Biener <richard.guenther@gmail.com> writes:

>>>>> On Sun, Dec 10, 2017 at 12:20 AM, Richard Sandiford

>>>>> <richard.sandiford@linaro.org> wrote:

>>>>>> This patch changes vec_perm_indices from a plain vec<> to a class

>>>>>> that stores a canonicalised permutation, using the same encoding

>>>>>> as for VECTOR_CSTs.  This means that vec_perm_indices now carries

>>>>>> information about the number of vectors being permuted (currently

>>>>>> always 1 or 2) and the number of elements in each input vector.

>>>>>

>>>>> Before I dive into  the C++ details can you explain why it needs this

>>>>> info and how it encodes it for variable-length vectors?  To interleave

>>>>> two vectors you need sth like { 0, N, 1, N+1, ... }, I'm not sure we

>>>>> can directly encode N here, can we?  extract even/odd should just

>>>>> work as { 0, 2, 4, 6, ...} without knowledge of whether we permute

>>>>> one or two vectors (the one vector case just has two times the same

>>>>> vector) or how many elements each of the vectors (or the result) has.

>>>>

>>>> One of the later patches switches the element types to HOST_WIDE_INT,

>>>> so that we can represent all ssizetypes.  Then there's a poly_int

>>>> patch (not yet posted) to make that poly_int64, so that we can

>>>> represent the N even for variable-length vectors.

>>>>

>>>> The class needs to know the number of elements because that affects

>>>> the canonical representation.  E.g. extract even on fixed-length

>>>> vectors with both inputs the same should be { 0, 2, 4, ..., 0, 2, 4 ... },

>>>> which we can't encode as a simple series.  Interleave low with both

>>>> inputs the same should be { 0, 0, 1, 1, ... } for both fixed-length and

>>>> variable-length vectors.

>>>

>>> Huh?  extract even is { 0, 2, 4, 6, 8 ... } indexes in the selection vector

>>> are referencing concat'ed input vectors.  So yes, for two same vectors

>>> that's effectively { 0, 2, 4, ..., 0, 2, 4, ... } but I don't see why

>>> that should

>>> be the canonical form?

>>

>> Current practice is to use the single-input form where possible,

>> if both inputs are the same (see e.g. the VEC_PERM_EXPR handling

>> in fold-const.c).  It means that things like:

>>

>>     _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;

>>     _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;

>>     _3 = VEC_PERM_EXPR <a, b, { 0, 2, 4, 6, 0, 2, 4, 6 }>;

>>

>> get folded to the same sequence, and so can be CSEd.

>>

>> We could instead convert the single-input form to use the two-input

>> selector, but that would be harder.  The advantage of treating the

>> single-input form as canonical is that it works even for irregular

>> permutes.

>

> Ok, I see.  Maybe adding a comment along this helps.


OK, thanks, installed as below with that change.

Richard


2018-01-02  Richard Sandiford  <richard.sandiford@linaro.org>

gcc/
	* int-vector-builder.h: New file.
	* vec-perm-indices.h: Include int-vector-builder.h.
	(vec_perm_indices): Redefine as an int_vector_builder.
	(auto_vec_perm_indices): Delete.
	(vec_perm_builder): Redefine as a stand-alone class.
	(vec_perm_indices::vec_perm_indices): New function.
	(vec_perm_indices::clamp): Likewise.
	* vec-perm-indices.c: Include fold-const.h and tree-vector-builder.h.
	(vec_perm_indices::new_vector): New function.
	(vec_perm_indices::new_expanded_vector): Update for new
	vec_perm_indices class.
	(vec_perm_indices::rotate_inputs): New function.
	(vec_perm_indices::all_in_range_p): Operate directly on the
	encoded form, without computing elided elements.
	(tree_to_vec_perm_builder): Operate directly on the VECTOR_CST
	encoding.  Update for new vec_perm_indices class.
	* optabs.c (expand_vec_perm_const): Create a vec_perm_indices for
	the given vec_perm_builder.
	(expand_vec_perm_var): Update vec_perm_builder constructor.
	(expand_mult_highpart): Use vec_perm_builder instead of
	auto_vec_perm_indices.
	* optabs-query.c (can_mult_highpart_p): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use a single
	or double series encoding as appropriate.
	* fold-const.c (fold_ternary_loc): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.
	* tree-ssa-forwprop.c (simplify_vector_constructor): Likewise.
	* tree-vect-data-refs.c (vect_grouped_store_supported): Likewise.
	(vect_permute_store_chain): Likewise.
	(vect_grouped_load_supported): Likewise.
	(vect_permute_load_chain): Likewise.
	(vect_shift_permute_load_chain): Likewise.
	* tree-vect-slp.c (vect_build_slp_tree_1): Likewise.
	(vect_transform_slp_perm_load): Likewise.
	(vect_schedule_slp_instance): Likewise.
	* tree-vect-stmts.c (perm_mask_for_reverse): Likewise.
	(vectorizable_mask_load_store): Likewise.
	(vectorizable_bswap): Likewise.
	(vectorizable_store): Likewise.
	(vectorizable_load): Likewise.
	* tree-vect-generic.c (lower_vec_perm): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Use
	tree_to_vec_perm_builder to read the vector from a tree.
	* tree-vect-loop.c (calc_vec_perm_mask_for_shift): Take a
	vec_perm_builder instead of a vec_perm_indices.
	(have_whole_vector_shift): Use vec_perm_builder and
	vec_perm_indices instead of auto_vec_perm_indices.  Leave the
	truncation to calc_vec_perm_mask_for_shift.
	(vect_create_epilog_for_reduction): Likewise.
	* config/aarch64/aarch64.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(aarch64_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(aarch64_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/arm/arm.c (expand_vec_perm_d::perm): Change
	from auto_vec_perm_indices to vec_perm_indices.
	(arm_expand_vec_perm_const_1): Use rotate_inputs on d.perm
	instead of changing individual elements.
	(arm_vectorize_vec_perm_const): Use new_vector to install
	the vector in d.perm.
	* config/powerpcspe/powerpcspe.c (rs6000_expand_extract_even):
	Update vec_perm_builder constructor.
	(rs6000_expand_interleave): Likewise.
	* config/rs6000/rs6000.c (rs6000_expand_extract_even): Likewise.
	(rs6000_expand_interleave): Likewise.
------------------------------------------------------------------------------

Index: gcc/int-vector-builder.h
===================================================================
--- /dev/null	2017-12-30 11:27:13.464311244 +0000
+++ gcc/int-vector-builder.h	2018-01-02 17:01:28.746627393 +0000
@@ -0,0 +1,90 @@
+/* A class for building vector integer constants.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_INT_VECTOR_BUILDER_H
+#define GCC_INT_VECTOR_BUILDER_H 1
+
+#include "vector-builder.h"
+
+/* This class is used to build vectors of integer type T using the same
+   encoding as tree and rtx constants.  See vector_builder for more
+   details.  */
+template<typename T>
+class int_vector_builder : public vector_builder<T, int_vector_builder<T> >
+{
+  typedef vector_builder<T, int_vector_builder> parent;
+  friend class vector_builder<T, int_vector_builder>;
+
+public:
+  int_vector_builder () {}
+  int_vector_builder (unsigned int, unsigned int, unsigned int);
+
+  using parent::new_vector;
+
+private:
+  bool equal_p (T, T) const;
+  bool allow_steps_p () const { return true; }
+  bool integral_p (T) const { return true; }
+  T step (T, T) const;
+  T apply_step (T, unsigned int, T) const;
+  bool can_elide_p (T) const { return true; }
+  void note_representative (T *, T) {}
+};
+
+/* Create a new builder for a vector with FULL_NELTS elements.
+   Initially encode the value as NPATTERNS interleaved patterns with
+   NELTS_PER_PATTERN elements each.  */
+
+template<typename T>
+inline
+int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,
+					   unsigned int npatterns,
+					   unsigned int nelts_per_pattern)
+{
+  new_vector (full_nelts, npatterns, nelts_per_pattern);
+}
+
+/* Return true if elements ELT1 and ELT2 are equal.  */
+
+template<typename T>
+inline bool
+int_vector_builder<T>::equal_p (T elt1, T elt2) const
+{
+  return elt1 == elt2;
+}
+
+/* Return the value of element ELT2 minus the value of element ELT1.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::step (T elt1, T elt2) const
+{
+  return elt2 - elt1;
+}
+
+/* Return a vector element with the value BASE + FACTOR * STEP.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const
+{
+  return base + factor * step;
+}
+
+#endif
Index: gcc/vec-perm-indices.h
===================================================================
--- gcc/vec-perm-indices.h	2018-01-02 17:01:26.633719678 +0000
+++ gcc/vec-perm-indices.h	2018-01-02 18:25:11.081076126 +0000
@@ -20,30 +20,117 @@ Software Foundation; either version 3, o
 #ifndef GCC_VEC_PERN_INDICES_H
 #define GCC_VEC_PERN_INDICES_H 1
 
+#include "int-vector-builder.h"
+
+/* A vector_builder for building constant permutation vectors.
+   The elements do not need to be clamped to a particular range
+   of input elements.  */
+typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;
+
 /* This class represents a constant permutation vector, such as that used
-   as the final operand to a VEC_PERM_EXPR.  */
-class vec_perm_indices : public auto_vec<unsigned short, 32>
+   as the final operand to a VEC_PERM_EXPR.
+
+   Permutation vectors select indices modulo the number of input elements,
+   and the class canonicalizes each permutation vector for a particular
+   number of input vectors and for a particular number of elements per
+   input.  For example, the gimple statements:
+
+    _1 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 0, 2, 4, 6 }>;
+    _2 = VEC_PERM_EXPR <a, a, { 0, 2, 4, 6, 8, 10, 12, 14 }>;
+    _3 = VEC_PERM_EXPR <a, a, { 0, 2, 20, 22, 24, 2, 4, 14 }>;
+
+   effectively have only a single vector input "a".  If "a" has 8
+   elements, the indices select elements modulo 8, which makes all three
+   VEC_PERM_EXPRs equivalent.  The canonical form is for the indices to be
+   in the range [0, number of input elements - 1], so the class treats the
+   second and third permutation vectors as though they had been the first.
+
+   The class copes with cases in which the input and output vectors have
+   different numbers of elements.  */
+class vec_perm_indices
 {
-  typedef unsigned short element_type;
-  typedef auto_vec<element_type, 32> parent_type;
+  typedef HOST_WIDE_INT element_type;
 
 public:
-  vec_perm_indices () {}
-  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
+  vec_perm_indices ();
+  vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);
 
+  void new_vector (const vec_perm_builder &, unsigned int, unsigned int);
   void new_expanded_vector (const vec_perm_indices &, unsigned int);
+  void rotate_inputs (int delta);
+
+  /* Return the underlying vector encoding.  */
+  const vec_perm_builder &encoding () const { return m_encoding; }
+
+  /* Return the number of output elements.  This is called length ()
+     so that we present a more vec-like interface.  */
+  unsigned int length () const { return m_encoding.full_nelts (); }
 
+  /* Return the number of input vectors being permuted.  */
+  unsigned int ninputs () const { return m_ninputs; }
+
+  /* Return the number of elements in each input vector.  */
+  unsigned int nelts_per_input () const { return m_nelts_per_input; }
+
+  /* Return the total number of input elements.  */
+  unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }
+
+  element_type clamp (element_type) const;
+  element_type operator[] (unsigned int i) const;
   bool all_in_range_p (element_type, element_type) const;
 
 private:
   vec_perm_indices (const vec_perm_indices &);
-};
 
-/* Temporary.  */
-typedef vec_perm_indices vec_perm_builder;
-typedef vec_perm_indices auto_vec_perm_indices;
+  vec_perm_builder m_encoding;
+  unsigned int m_ninputs;
+  unsigned int m_nelts_per_input;
+};
 
 bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
 rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
 
+inline
+vec_perm_indices::vec_perm_indices ()
+  : m_ninputs (0),
+    m_nelts_per_input (0)
+{
+}
+
+/* Construct a permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of
+   the new vector from ELEMENTS, clamping each one to be in range.  */
+
+inline
+vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,
+				    unsigned int ninputs,
+				    unsigned int nelts_per_input)
+{
+  new_vector (elements, ninputs, nelts_per_input);
+}
+
+/* Return the canonical value for permutation vector element ELT,
+   taking into account the current number of input elements.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::clamp (element_type elt) const
+{
+  element_type limit = input_nelts ();
+  elt %= limit;
+  /* Treat negative elements as counting from the end.  This only matters
+     if the vector size is not a power of 2.  */
+  if (elt < 0)
+    elt += limit;
+  return elt;
+}
+
+/* Return the value of vector element I, which might or might not be
+   explicitly encoded.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::operator[] (unsigned int i) const
+{
+  return clamp (m_encoding.elt (i));
+}
+
 #endif
Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c	2018-01-02 17:01:26.632719721 +0000
+++ gcc/vec-perm-indices.c	2018-01-02 17:01:28.750627219 +0000
@@ -22,11 +22,33 @@ Software Foundation; either version 3, o
 #include "coretypes.h"
 #include "vec-perm-indices.h"
 #include "tree.h"
+#include "fold-const.h"
+#include "tree-vector-builder.h"
 #include "backend.h"
 #include "rtl.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
 
+/* Switch to a new permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
+   new permutation vector from ELEMENTS, clamping each one to be in range.  */
+
+void
+vec_perm_indices::new_vector (const vec_perm_builder &elements,
+			      unsigned int ninputs,
+			      unsigned int nelts_per_input)
+{
+  m_ninputs = ninputs;
+  m_nelts_per_input = nelts_per_input;
+  /* Expand the encoding and clamp each element.  E.g. { 0, 2, 4, ... }
+     might wrap halfway if there is only one vector input.  */
+  unsigned int full_nelts = elements.full_nelts ();
+  m_encoding.new_vector (full_nelts, full_nelts, 1);
+  for (unsigned int i = 0; i < full_nelts; ++i)
+    m_encoding.quick_push (clamp (elements.elt (i)));
+  m_encoding.finalize ();
+}
+
 /* Switch to a new permutation vector that selects the same input elements
    as ORIG, but with each element split into FACTOR pieces.  For example,
    if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
@@ -36,14 +58,31 @@ Software Foundation; either version 3, o
 vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
 				       unsigned int factor)
 {
-  truncate (0);
-  reserve (orig.length () * factor);
-  for (unsigned int i = 0; i < orig.length (); ++i)
+  m_ninputs = orig.m_ninputs;
+  m_nelts_per_input = orig.m_nelts_per_input * factor;
+  m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,
+			 orig.m_encoding.npatterns () * factor,
+			 orig.m_encoding.nelts_per_pattern ());
+  unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
     {
-      element_type base = orig[i] * factor;
+      element_type base = orig.m_encoding[i] * factor;
       for (unsigned int j = 0; j < factor; ++j)
-	quick_push (base + j);
+	m_encoding.quick_push (base + j);
     }
+  m_encoding.finalize ();
+}
+
+/* Rotate the inputs of the permutation right by DELTA inputs.  This changes
+   the values of the permutation vector but it doesn't change the way that
+   the elements are encoded.  */
+
+void
+vec_perm_indices::rotate_inputs (int delta)
+{
+  element_type element_delta = delta * m_nelts_per_input;
+  for (unsigned int i = 0; i < m_encoding.length (); ++i)
+    m_encoding[i] = clamp (m_encoding[i] + element_delta);
 }
 
 /* Return true if all elements of the permutation vector are in the range
@@ -52,9 +91,44 @@ vec_perm_indices::new_expanded_vector (c
 bool
 vec_perm_indices::all_in_range_p (element_type start, element_type size) const
 {
-  for (unsigned int i = 0; i < length (); ++i)
-    if ((*this)[i] < start || ((*this)[i] - start) >= size)
+  /* Check the first two elements of each pattern.  */
+  unsigned int npatterns = m_encoding.npatterns ();
+  unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();
+  unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);
+  for (unsigned int i = 0; i < base_nelts; ++i)
+    if (m_encoding[i] < start || (m_encoding[i] - start) >= size)
       return false;
+
+  /* For stepped encodings, check the full range of the series.  */
+  if (nelts_per_pattern == 3)
+    {
+      element_type limit = input_nelts ();
+
+      /* The number of elements in each pattern beyond the first two
+	 that we checked above.  */
+      unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;
+      for (unsigned int i = 0; i < npatterns; ++i)
+	{
+	  /* BASE1 has been checked but BASE2 hasn't.   */
+	  element_type base1 = m_encoding[i + npatterns];
+	  element_type base2 = m_encoding[i + base_nelts];
+
+	  /* The step to add to get from BASE1 to each subsequent value.  */
+	  element_type step = clamp (base2 - base1);
+
+	  /* STEP has no inherent sign, so a value near LIMIT can
+	     act as a negative step.  The series is in range if it
+	     is in range according to one of the two interpretations.
+
+	     Since we're dealing with clamped values, ELEMENT_TYPE is
+	     wide enough for overflow not to be a problem.  */
+	  element_type headroom_down = base1 - start;
+	  element_type headroom_up = size - headroom_down - 1;
+	  if (headroom_up < step * step_nelts
+	      && headroom_down < (limit - step) * step_nelts)
+	    return false;
+	}
+    }
   return true;
 }
 
@@ -65,15 +139,16 @@ vec_perm_indices::all_in_range_p (elemen
 bool
 tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
 {
-  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
-  for (unsigned int i = 0; i < nelts; ++i)
-    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
+  unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))
       return false;
 
-  builder->reserve (nelts);
-  for (unsigned int i = 0; i < nelts; ++i)
-    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
-			 & (2 * nelts - 1));
+  builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),
+		       VECTOR_CST_NPATTERNS (cst),
+		       VECTOR_CST_NELTS_PER_PATTERN (cst));
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));
   return true;
 }
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2018-01-02 17:01:27.762670303 +0000
+++ gcc/optabs.c	2018-01-02 17:01:28.747627350 +0000
@@ -5476,6 +5476,11 @@ expand_vec_perm_const (machine_mode mode
   rtx_insn *last = get_last_insn ();
 
   bool single_arg_p = rtx_equal_p (v0, v1);
+  /* Always specify two input vectors here and leave the target to handle
+     cases in which the inputs are equal.  Not all backends can cope with
+     the single-input representation when testing for a double-input
+     target instruction.  */
+  vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
 
   /* See if this can be handled with a vec_shr.  We only do this if the
      second vector is all zeroes.  */
@@ -5488,7 +5493,7 @@ expand_vec_perm_const (machine_mode mode
       && (shift_code != CODE_FOR_nothing
 	  || shift_code_qi != CODE_FOR_nothing))
     {
-      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
       if (shift_amt)
 	{
 	  struct expand_operand ops[3];
@@ -5520,7 +5525,7 @@ expand_vec_perm_const (machine_mode mode
       else
 	v1 = force_reg (mode, v1);
 
-      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
+      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
 	return target;
     }
 
@@ -5529,7 +5534,7 @@ expand_vec_perm_const (machine_mode mode
   rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
   if (qimode != VOIDmode)
     {
-      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
+      qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));
       target_qi = gen_reg_rtx (qimode);
       v0_qi = gen_lowpart (qimode, v0);
       v1_qi = gen_lowpart (qimode, v1);
@@ -5556,7 +5561,7 @@ expand_vec_perm_const (machine_mode mode
      REQUIRED_SEL_MODE is OK.  */
   if (sel_mode != required_sel_mode)
     {
-      if (!selector_fits_mode_p (required_sel_mode, sel))
+      if (!selector_fits_mode_p (required_sel_mode, indices))
 	{
 	  delete_insns_since (last);
 	  return NULL_RTX;
@@ -5567,7 +5572,7 @@ expand_vec_perm_const (machine_mode mode
   insn_code icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
-      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
+      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);
       rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
       if (tmp)
 	return tmp;
@@ -5642,7 +5647,7 @@ expand_vec_perm_var (machine_mode mode,
   gcc_assert (sel != NULL);
 
   /* Broadcast the low byte each element into each of its bytes.  */
-  vec_perm_builder const_sel (w);
+  vec_perm_builder const_sel (w, w, 1);
   for (i = 0; i < w; ++i)
     {
       int this_e = i / u * u;
@@ -5890,7 +5895,7 @@ expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   if (method == 2)
     {
       for (i = 0; i < nunits; ++i)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2018-01-02 17:01:27.761670346 +0000
+++ gcc/optabs-query.c	2018-01-02 17:01:28.746627393 +0000
@@ -516,12 +516,13 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (!BYTES_BIG_ENDIAN
 			    + (i & ~1)
 			    + ((i & 1) ? nunits : 0));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 2;
 	}
     }
@@ -532,10 +533,11 @@ can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 3;
 	}
     }
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2018-01-02 17:01:26.628719897 +0000
+++ gcc/fold-const.c	2018-01-02 17:01:28.746627393 +0000
@@ -11373,7 +11373,7 @@ fold_ternary_loc (location_t loc, enum t
 	    {
 	      unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
 	      gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	      auto_vec_perm_indices sel (nelts);
+	      vec_perm_builder sel (nelts, nelts, 1);
 	      for (i = 0; i < nelts; i++)
 		{
 		  tree val = VECTOR_CST_ELT (arg0, i);
@@ -11384,7 +11384,8 @@ fold_ternary_loc (location_t loc, enum t
 		  else /* Currently unreachable.  */
 		    return NULL_TREE;
 		}
-	      tree t = fold_vec_perm (type, arg1, arg2, sel);
+	      tree t = fold_vec_perm (type, arg1, arg2,
+				      vec_perm_indices (sel, 2, nelts));
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11716,8 +11717,8 @@ fold_ternary_loc (location_t loc, enum t
 	  mask2 = 2 * nelts - 1;
 	  mask = single_arg ? (nelts - 1) : mask2;
 	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	  auto_vec_perm_indices sel (nelts);
-	  auto_vec_perm_indices sel2 (nelts);
+	  vec_perm_builder sel (nelts, nelts, 1);
+	  vec_perm_builder sel2 (nelts, nelts, 1);
 	  for (i = 0; i < nelts; i++)
 	    {
 	      tree val = VECTOR_CST_ELT (arg2, i);
@@ -11762,12 +11763,13 @@ fold_ternary_loc (location_t loc, enum t
 	      need_mask_canon = true;
 	    }
 
+	  vec_perm_indices indices (sel, 2, nelts);
 	  if ((TREE_CODE (op0) == VECTOR_CST
 	       || TREE_CODE (op0) == CONSTRUCTOR)
 	      && (TREE_CODE (op1) == VECTOR_CST
 		  || TREE_CODE (op1) == CONSTRUCTOR))
 	    {
-	      tree t = fold_vec_perm (type, op0, op1, sel);
+	      tree t = fold_vec_perm (type, op0, op1, indices);
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11779,11 +11781,14 @@ fold_ternary_loc (location_t loc, enum t
 	     argument permutation while still allowing an equivalent
 	     2-argument version.  */
 	  if (need_mask_canon && arg2 == op2
-	      && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
-	      && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
+	      && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
+	      && can_vec_perm_const_p (TYPE_MODE (type),
+				       vec_perm_indices (sel2, 2, nelts),
+				       false))
 	    {
 	      need_mask_canon = need_mask_canon2;
-	      sel = sel2;
+	      sel.truncate (0);
+	      sel.splice (sel2);
 	    }
 
 	  if (need_mask_canon && arg2 == op2)
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2018-01-02 17:01:26.630719809 +0000
+++ gcc/tree-ssa-forwprop.c	2018-01-02 17:01:28.747627350 +0000
@@ -2018,7 +2018,7 @@ simplify_vector_constructor (gimple_stmt
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  auto_vec_perm_indices sel (nelts);
+  vec_perm_builder sel (nelts, nelts, 1);
   orig = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
@@ -2109,7 +2109,8 @@ simplify_vector_constructor (gimple_stmt
     {
       tree mask_type;
 
-      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
+      vec_perm_indices indices (sel, 1, nelts);
+      if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
       mask_type
 	= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2018-01-02 17:01:26.630719809 +0000
+++ gcc/tree-vect-data-refs.c	2018-01-02 17:01:28.748627306 +0000
@@ -4579,7 +4579,7 @@ vect_grouped_store_supported (tree vecty
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
@@ -4587,6 +4587,7 @@ vect_grouped_store_supported (tree vecty
 	  unsigned int j0 = 0, j1 = 0, j2 = 0;
 	  unsigned int i, j;
 
+	  vec_perm_indices indices;
 	  for (j = 0; j < 3; j++)
 	    {
 	      int nelt0 = ((3 - j) * nelt) % 3;
@@ -4601,7 +4602,8 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = 0;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4618,7 +4620,8 @@ vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = nelt + j2++;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4638,11 +4641,13 @@ vect_grouped_store_supported (tree vecty
 	      sel[i * 2] = i;
 	      sel[i * 2 + 1] = i + nelt;
 	    }
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] += nelt / 2;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
 	}
@@ -4744,7 +4749,7 @@ vect_permute_store_chain (vec<tree> dr_c
   unsigned int i, n, log_length = exact_log2 (length);
   unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -4755,6 +4760,7 @@ vect_permute_store_chain (vec<tree> dr_c
     {
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+      vec_perm_indices indices;
       for (j = 0; j < 3; j++)
         {
 	  int nelt0 = ((3 - j) * nelt) % 3;
@@ -4770,7 +4776,8 @@ vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = 0;
 	    }
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0; i < nelt; i++)
 	    {
@@ -4781,7 +4788,8 @@ vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = nelt + j2++;
 	    }
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  vect1 = dr_chain[0];
 	  vect2 = dr_chain[1];
@@ -4818,11 +4826,13 @@ vect_permute_store_chain (vec<tree> dr_c
 	  sel[i * 2] = i;
 	  sel[i * 2 + 1] = i + nelt;
 	}
-	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	vec_perm_indices indices (sel, 2, nelt);
+	perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0; i < nelt; i++)
 	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	indices.new_vector (sel, 2, nelt);
+	perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0, n = log_length; i < n; i++)
 	  {
@@ -5167,11 +5177,12 @@ vect_grouped_load_supported (tree vectyp
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
 	{
+	  vec_perm_indices indices;
 	  unsigned int k;
 	  for (k = 0; k < 3; k++)
 	    {
@@ -5180,7 +5191,8 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = 3 * i + k;
 		else
 		  sel[i] = 0;
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5193,7 +5205,8 @@ vect_grouped_load_supported (tree vectyp
 		  sel[i] = i;
 		else
 		  sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5208,13 +5221,16 @@ vect_grouped_load_supported (tree vectyp
 	{
 	  /* If length is not equal to 3 then only power of 2 is supported.  */
 	  gcc_assert (pow2p_hwi (count));
+
 	  for (i = 0; i < nelt; i++)
 	    sel[i] = i * 2;
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] = i * 2 + 1;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
         }
@@ -5329,7 +5345,7 @@ vect_permute_load_chain (vec<tree> dr_ch
   unsigned int i, j, log_length = exact_log2 (length);
   unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5340,6 +5356,7 @@ vect_permute_load_chain (vec<tree> dr_ch
     {
       unsigned int k;
 
+      vec_perm_indices indices;
       for (k = 0; k < 3; k++)
 	{
 	  for (i = 0; i < nelt; i++)
@@ -5347,15 +5364,16 @@ vect_permute_load_chain (vec<tree> dr_ch
 	      sel[i] = 3 * i + k;
 	    else
 	      sel[i] = 0;
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0, j = 0; i < nelt; i++)
 	    if (3 * i + k < 2 * nelt)
 	      sel[i] = i;
 	    else
 	      sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  first_vect = dr_chain[0];
 	  second_vect = dr_chain[1];
@@ -5387,11 +5405,13 @@ vect_permute_load_chain (vec<tree> dr_ch
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+      vec_perm_indices indices (sel, 2, nelt);
+      perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+      indices.new_vector (sel, 2, nelt);
+      perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5527,7 +5547,7 @@ vect_shift_permute_load_chain (vec<tree>
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5541,7 +5561,8 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i * 2;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2 + 1;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5549,13 +5570,14 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt / 2; ++i)
 	sel[i] = i * 2 + 1;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5563,20 +5585,21 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = nelt / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to select vector from 2.
 	 For vector length 8 it is {0 1 2 3 12 13 14 15}.  */
@@ -5584,14 +5607,15 @@ vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i;
       for (i = nelt / 2; i < nelt; i++)
 	sel[i] = nelt + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "select is not supported by target\n");
 	  return false;
 	}
-      select_mask = vect_gen_perm_mask_checked (vectype, sel);
+      select_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5647,7 +5671,8 @@ vect_shift_permute_load_chain (vec<tree>
 	  sel[i] = 3 * k + (l % 3);
 	  k++;
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5655,59 +5680,63 @@ vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      perm3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + 1 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift2_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift4_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (k = 0; k < 3; k++)
 	{
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2018-01-02 17:01:26.632719721 +0000
+++ gcc/tree-vect-slp.c	2018-01-02 17:01:28.749627263 +0000
@@ -894,7 +894,7 @@ vect_build_slp_tree_1 (vec_info *vinfo,
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
       unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
-      auto_vec_perm_indices sel (count);
+      vec_perm_builder sel (count, count, 1);
       for (i = 0; i < count; ++i)
 	{
 	  unsigned int elt = i;
@@ -902,7 +902,8 @@ vect_build_slp_tree_1 (vec_info *vinfo,
 	    elt += count;
 	  sel.quick_push (elt);
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, count);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -3570,8 +3571,9 @@ vect_transform_slp_perm_load (slp_tree n
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
   mask_type = get_vectype_for_scalar_type (mask_element_type);
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  auto_vec_perm_indices mask (nunits);
+  vec_perm_builder mask (nunits, nunits, 1);
   mask.quick_grow (nunits);
+  vec_perm_indices indices;
 
   /* Initialize the vect stmts of NODE to properly insert the generated
      stmts later.  */
@@ -3644,10 +3646,10 @@ vect_transform_slp_perm_load (slp_tree n
 	    noop_p = false;
 	  mask[index++] = mask_element;
 
-	  if (index == nunits)
+	  if (index == nunits && !noop_p)
 	    {
-	      if (! noop_p
-		  && ! can_vec_perm_const_p (mode, mask))
+	      indices.new_vector (mask, 2, nunits);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -3655,16 +3657,19 @@ vect_transform_slp_perm_load (slp_tree n
 				       vect_location, 
 				       "unsupported vect permute { ");
 		      for (i = 0; i < nunits; ++i)
-			dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
+			dump_printf (MSG_MISSED_OPTIMIZATION,
+				     HOST_WIDE_INT_PRINT_DEC " ", mask[i]);
 		      dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
 		    }
 		  gcc_assert (analyze_only);
 		  return false;
 		}
 
-	      if (! noop_p)
-		++*n_perms;
+	      ++*n_perms;
+	    }
 
+	  if (index == nunits)
+	    {
 	      if (!analyze_only)
 		{
 		  tree mask_vec = NULL_TREE;
@@ -3797,7 +3802,7 @@ vect_schedule_slp_instance (slp_tree nod
       enum tree_code code0 = gimple_assign_rhs_code (stmt);
       enum tree_code ocode = ERROR_MARK;
       gimple *ostmt;
-      auto_vec_perm_indices mask (group_size);
+      vec_perm_builder mask (group_size, group_size, 1);
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)
 	if (gimple_assign_rhs_code (ostmt) != code0)
 	  {
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2018-01-02 17:01:26.632719721 +0000
+++ gcc/tree-vect-stmts.c	2018-01-02 17:01:28.750627219 +0000
@@ -1717,13 +1717,14 @@ perm_mask_for_reverse (tree vectype)
 
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   for (i = 0; i < nunits; ++i)
     sel.quick_push (nunits - 1 - i);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+  vec_perm_indices indices (sel, 1, nunits);
+  if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
     return NULL_TREE;
-  return vect_gen_perm_mask_checked (vectype, sel);
+  return vect_gen_perm_mask_checked (vectype, indices);
 }
 
 /* A subroutine of get_load_store_type, with a subset of the same
@@ -2185,27 +2186,32 @@ vectorizable_mask_load_store (gimple *st
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  sel.quick_grow (nunits);
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i < gather_off_nunits
 		     ? i : i + nunits - gather_off_nunits;
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
+
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i | gather_off_nunits;
-	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
+	  indices.new_vector (sel, 2, gather_off_nunits);
+	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
 	}
       else
 	gcc_unreachable ();
@@ -2498,12 +2504,13 @@ vectorizable_bswap (gimple *stmt, gimple
   unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
   unsigned word_bytes = num_bytes / nunits;
 
-  auto_vec_perm_indices elts (num_bytes);
+  vec_perm_builder elts (num_bytes, num_bytes, 1);
   for (unsigned i = 0; i < nunits; ++i)
     for (unsigned j = 0; j < word_bytes; ++j)
       elts.quick_push ((i + 1) * word_bytes - j - 1);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
+  vec_perm_indices indices (elts, 1, num_bytes);
+  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
     return false;
 
   if (! vec_stmt)
@@ -5826,22 +5833,25 @@ vectorizable_store (gimple *stmt, gimple
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (scatter_off_nunits);
+	  vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);
 	  for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, scatter_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	}
       else if (nunits == (unsigned int) scatter_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < (unsigned int) nunits; ++i)
 	    sel.quick_push (i | scatter_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	  ncopies *= 2;
 	}
@@ -6862,22 +6872,25 @@ vectorizable_load (gimple *stmt, gimple_
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (i < gather_off_nunits
 			    ? i : i + nunits - gather_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  ncopies *= 2;
 	}
       else
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2018-01-02 17:01:26.631719765 +0000
+++ gcc/tree-vect-generic.c	2018-01-02 17:01:28.748627306 +0000
@@ -1299,15 +1299,13 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	mask = gimple_assign_rhs1 (def_stmt);
     }
 
-  if (TREE_CODE (mask) == VECTOR_CST)
-    {
-      auto_vec_perm_indices sel_int (elements);
-
-      for (i = 0; i < elements; ++i)
-	sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
-			    & (2 * elements - 1));
+  vec_perm_builder sel_int;
 
-      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
+  if (TREE_CODE (mask) == VECTOR_CST
+      && tree_to_vec_perm_builder (&sel_int, mask))
+    {
+      vec_perm_indices indices (sel_int, 2, elements);
+      if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))
 	{
 	  gimple_assign_set_rhs3 (stmt, mask);
 	  update_stmt (stmt);
@@ -1319,14 +1317,14 @@ lower_vec_perm (gimple_stmt_iterator *gs
 	  != CODE_FOR_nothing
 	  && TREE_CODE (vec1) == VECTOR_CST
 	  && initializer_zerop (vec1)
-	  && sel_int[0]
-	  && sel_int[0] < elements)
+	  && indices[0]
+	  && indices[0] < elements)
 	{
 	  for (i = 1; i < elements; ++i)
 	    {
-	      unsigned int expected = i + sel_int[0];
+	      unsigned int expected = i + indices[0];
 	      /* Indices into the second vector are all equivalent.  */
-	      if (MIN (elements, (unsigned) sel_int[i])
+	      if (MIN (elements, (unsigned) indices[i])
 		  != MIN (elements, expected))
  		break;
 	    }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2018-01-02 17:01:26.631719765 +0000
+++ gcc/tree-vect-loop.c	2018-01-02 17:01:28.749627263 +0000
@@ -3713,12 +3713,11 @@ vect_estimate_min_profitable_iters (loop
    vector elements (not bits) for a vector with NELT elements.  */
 static void
 calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
-			      vec_perm_indices *sel)
+			      vec_perm_builder *sel)
 {
-  unsigned int i;
-
-  for (i = 0; i < nelt; i++)
-    sel->quick_push ((i + offset) & (2 * nelt - 1));
+  sel->new_vector (nelt, nelt, 1);
+  for (unsigned int i = 0; i < nelt; i++)
+    sel->quick_push (i + offset);
 }
 
 /* Checks whether the target supports whole-vector shifts for vectors of mode
@@ -3731,13 +3730,13 @@ have_whole_vector_shift (machine_mode mo
     return true;
 
   unsigned int i, nelt = GET_MODE_NUNITS (mode);
-  auto_vec_perm_indices sel (nelt);
-
+  vec_perm_builder sel;
+  vec_perm_indices indices;
   for (i = nelt/2; i >= 1; i/=2)
     {
-      sel.truncate (0);
       calc_vec_perm_mask_for_shift (i, nelt, &sel);
-      if (!can_vec_perm_const_p (mode, sel, false))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (mode, indices, false))
 	return false;
     }
   return true;
@@ -5055,7 +5054,8 @@ vect_create_epilog_for_reduction (vec<tr
       if (reduce_with_shift && !slp_reduc)
         {
           int nelements = vec_size_in_bits / element_bitsize;
-          auto_vec_perm_indices sel (nelements);
+	  vec_perm_builder sel;
+	  vec_perm_indices indices;
 
           int elt_offset;
 
@@ -5079,9 +5079,9 @@ vect_create_epilog_for_reduction (vec<tr
                elt_offset >= 1;
                elt_offset /= 2)
             {
-	      sel.truncate (0);
 	      calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
-	      tree mask = vect_gen_perm_mask_any (vectype, sel);
+	      indices.new_vector (sel, 2, nelements);
+	      tree mask = vect_gen_perm_mask_any (vectype, indices);
 	      epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
 						 new_temp, zero_vec, mask);
               new_name = make_ssa_name (vec_dest, epilog_stmt);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2018-01-02 17:01:26.602721036 +0000
+++ gcc/config/aarch64/aarch64.c	2018-01-02 17:01:28.736627829 +0000
@@ -13252,7 +13252,7 @@ #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -13642,10 +13642,7 @@ aarch64_expand_vec_perm_const_1 (struct
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      gcc_assert (nelt == (nelt & -nelt));
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] ^= nelt; /* Keep the same index, but in the other vector.  */
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -13685,12 +13682,10 @@ aarch64_vectorize_vec_perm_const (machin
 
   /* Calculate whether all elements are in one vector.  */
   unsigned int nelt = sel.length ();
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       unsigned int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -13709,8 +13704,6 @@ aarch64_vectorize_vec_perm_const (machin
 	 input vector.  */
       /* Fall Through.  */
     case 2:
-      for (i = 0; i < nelt; ++i)
-	d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -13721,6 +13714,8 @@ aarch64_vectorize_vec_perm_const (machin
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (!d.testing_p)
     return aarch64_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2018-01-02 17:01:26.604720948 +0000
+++ gcc/config/arm/arm.c	2018-01-02 17:01:28.739627698 +0000
@@ -28854,7 +28854,7 @@ #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -29362,9 +29362,7 @@ arm_expand_vec_perm_const_1 (struct expa
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -29404,12 +29402,10 @@ arm_vectorize_vec_perm_const (machine_mo
   d.testing_p = !target;
 
   nelt = GET_MODE_NUNITS (d.vmode);
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -29428,8 +29424,6 @@ arm_vectorize_vec_perm_const (machine_mo
 	 input vector.  */
       /* FALLTHRU */
     case 2:
-      for (i = 0; i < nelt; ++i)
-        d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -29440,6 +29434,8 @@ arm_vectorize_vec_perm_const (machine_mo
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (d.testing_p)
     return arm_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/powerpcspe/powerpcspe.c
===================================================================
--- gcc/config/powerpcspe/powerpcspe.c	2018-01-02 17:01:26.617720379 +0000
+++ gcc/config/powerpcspe/powerpcspe.c	2018-01-02 17:01:28.742627568 +0000
@@ -38782,7 +38782,7 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -38797,7 +38797,7 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2018-01-02 17:01:26.621720203 +0000
+++ gcc/config/rs6000/rs6000.c	2018-01-02 17:01:28.745627437 +0000
@@ -36042,7 +36042,7 @@ rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -36057,7 +36057,7 @@ rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)

Patch

Index: gcc/int-vector-builder.h
===================================================================
--- /dev/null	2017-12-09 13:59:56.352713187 +0000
+++ gcc/int-vector-builder.h	2017-12-09 22:48:47.545825268 +0000
@@ -0,0 +1,90 @@ 
+/* A class for building vector integer constants.
+   Copyright (C) 2017 Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+<http://www.gnu.org/licenses/>.  */
+
+#ifndef GCC_INT_VECTOR_BUILDER_H
+#define GCC_INT_VECTOR_BUILDER_H 1
+
+#include "vector-builder.h"
+
+/* This class is used to build vectors of integer type T using the same
+   encoding as tree and rtx constants.  See vector_builder for more
+   details.  */
+template<typename T>
+class int_vector_builder : public vector_builder<T, int_vector_builder<T> >
+{
+  typedef vector_builder<T, int_vector_builder> parent;
+  friend class vector_builder<T, int_vector_builder>;
+
+public:
+  int_vector_builder () {}
+  int_vector_builder (unsigned int, unsigned int, unsigned int);
+
+  using parent::new_vector;
+
+private:
+  bool equal_p (T, T) const;
+  bool allow_steps_p () const { return true; }
+  bool integral_p (T) const { return true; }
+  T step (T, T) const;
+  T apply_step (T, unsigned int, T) const;
+  bool can_elide_p (T) const { return true; }
+  void note_representative (T *, T) {}
+};
+
+/* Create a new builder for a vector with FULL_NELTS elements.
+   Initially encode the value as NPATTERNS interleaved patterns with
+   NELTS_PER_PATTERN elements each.  */
+
+template<typename T>
+inline
+int_vector_builder<T>::int_vector_builder (unsigned int full_nelts,
+					   unsigned int npatterns,
+					   unsigned int nelts_per_pattern)
+{
+  new_vector (full_nelts, npatterns, nelts_per_pattern);
+}
+
+/* Return true if elements ELT1 and ELT2 are equal.  */
+
+template<typename T>
+inline bool
+int_vector_builder<T>::equal_p (T elt1, T elt2) const
+{
+  return elt1 == elt2;
+}
+
+/* Return the value of element ELT2 minus the value of element ELT1.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::step (T elt1, T elt2) const
+{
+  return elt2 - elt1;
+}
+
+/* Return a vector element with the value BASE + FACTOR * STEP.  */
+
+template<typename T>
+inline T
+int_vector_builder<T>::apply_step (T base, unsigned int factor, T step) const
+{
+  return base + factor * step;
+}
+
+#endif
Index: gcc/vec-perm-indices.h
===================================================================
--- gcc/vec-perm-indices.h	2017-12-09 22:47:27.885318101 +0000
+++ gcc/vec-perm-indices.h	2017-12-09 22:48:47.548825399 +0000
@@ -20,30 +20,102 @@  Software Foundation; either version 3, o
 #ifndef GCC_VEC_PERN_INDICES_H
 #define GCC_VEC_PERN_INDICES_H 1
 
+#include "int-vector-builder.h"
+
+/* A vector_builder for building constant permutation vectors.
+   The elements do not need to be clamped to a particular range
+   of input elements.  */
+typedef int_vector_builder<HOST_WIDE_INT> vec_perm_builder;
+
 /* This class represents a constant permutation vector, such as that used
-   as the final operand to a VEC_PERM_EXPR.  */
-class vec_perm_indices : public auto_vec<unsigned short, 32>
+   as the final operand to a VEC_PERM_EXPR.  The vector is canonicalized
+   for a particular number of input vectors and for a particular number
+   of elements per input.  The class copes with cases in which the
+   input and output vectors have different numbers of elements.  */
+class vec_perm_indices
 {
-  typedef unsigned short element_type;
-  typedef auto_vec<element_type, 32> parent_type;
+  typedef HOST_WIDE_INT element_type;
 
 public:
-  vec_perm_indices () {}
-  vec_perm_indices (unsigned int nunits) : parent_type (nunits) {}
+  vec_perm_indices ();
+  vec_perm_indices (const vec_perm_builder &, unsigned int, unsigned int);
 
+  void new_vector (const vec_perm_builder &, unsigned int, unsigned int);
   void new_expanded_vector (const vec_perm_indices &, unsigned int);
+  void rotate_inputs (int delta);
+
+  /* Return the underlying vector encoding.  */
+  const vec_perm_builder &encoding () const { return m_encoding; }
+
+  /* Return the number of output elements.  This is called length ()
+     so that we present a more vec-like interface.  */
+  unsigned int length () const { return m_encoding.full_nelts (); }
+
+  /* Return the number of input vectors being permuted.  */
+  unsigned int ninputs () const { return m_ninputs; }
 
+  /* Return the number of elements in each input vector.  */
+  unsigned int nelts_per_input () const { return m_nelts_per_input; }
+
+  /* Return the total number of input elements.  */
+  unsigned int input_nelts () const { return m_ninputs * m_nelts_per_input; }
+
+  element_type clamp (element_type) const;
+  element_type operator[] (unsigned int i) const;
   bool all_in_range_p (element_type, element_type) const;
 
 private:
   vec_perm_indices (const vec_perm_indices &);
-};
 
-/* Temporary.  */
-typedef vec_perm_indices vec_perm_builder;
-typedef vec_perm_indices auto_vec_perm_indices;
+  vec_perm_builder m_encoding;
+  unsigned int m_ninputs;
+  unsigned int m_nelts_per_input;
+};
 
 bool tree_to_vec_perm_builder (vec_perm_builder *, tree);
 rtx vec_perm_indices_to_rtx (machine_mode, const vec_perm_indices &);
 
+inline
+vec_perm_indices::vec_perm_indices ()
+  : m_ninputs (0),
+    m_nelts_per_input (0)
+{
+}
+
+/* Construct a permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of
+   the new vector from ELEMENTS, clamping each one to be in range.  */
+
+inline
+vec_perm_indices::vec_perm_indices (const vec_perm_builder &elements,
+				    unsigned int ninputs,
+				    unsigned int nelts_per_input)
+{
+  new_vector (elements, ninputs, nelts_per_input);
+}
+
+/* Return the canonical value for permutation vector element ELT,
+   taking into account the current number of input elements.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::clamp (element_type elt) const
+{
+  element_type limit = input_nelts ();
+  elt %= limit;
+  /* Treat negative elements as counting from the end.  This only matters
+     if the vector size is not a power of 2.  */
+  if (elt < 0)
+    elt += limit;
+  return elt;
+}
+
+/* Return the value of vector element I, which might or might not be
+   explicitly encoded.  */
+
+inline vec_perm_indices::element_type
+vec_perm_indices::operator[] (unsigned int i) const
+{
+  return clamp (m_encoding.elt (i));
+}
+
 #endif
Index: gcc/vec-perm-indices.c
===================================================================
--- gcc/vec-perm-indices.c	2017-12-09 22:47:27.885318101 +0000
+++ gcc/vec-perm-indices.c	2017-12-09 22:48:47.548825399 +0000
@@ -22,11 +22,33 @@  Software Foundation; either version 3, o
 #include "coretypes.h"
 #include "vec-perm-indices.h"
 #include "tree.h"
+#include "fold-const.h"
+#include "tree-vector-builder.h"
 #include "backend.h"
 #include "rtl.h"
 #include "memmodel.h"
 #include "emit-rtl.h"
 
+/* Switch to a new permutation vector that selects between NINPUTS vector
+   inputs that have NELTS_PER_INPUT elements each.  Take the elements of the
+   new permutation vector from ELEMENTS, clamping each one to be in range.  */
+
+void
+vec_perm_indices::new_vector (const vec_perm_builder &elements,
+			      unsigned int ninputs,
+			      unsigned int nelts_per_input)
+{
+  m_ninputs = ninputs;
+  m_nelts_per_input = nelts_per_input;
+  /* Expand the encoding and clamp each element.  E.g. { 0, 2, 4, ... }
+     might wrap halfway if there is only one vector input.  */
+  unsigned int full_nelts = elements.full_nelts ();
+  m_encoding.new_vector (full_nelts, full_nelts, 1);
+  for (unsigned int i = 0; i < full_nelts; ++i)
+    m_encoding.quick_push (clamp (elements.elt (i)));
+  m_encoding.finalize ();
+}
+
 /* Switch to a new permutation vector that selects the same input elements
    as ORIG, but with each element split into FACTOR pieces.  For example,
    if ORIG is { 1, 2, 0, 3 } and FACTOR is 2, the new permutation is
@@ -36,14 +58,31 @@  Software Foundation; either version 3, o
 vec_perm_indices::new_expanded_vector (const vec_perm_indices &orig,
 				       unsigned int factor)
 {
-  truncate (0);
-  reserve (orig.length () * factor);
-  for (unsigned int i = 0; i < orig.length (); ++i)
+  m_ninputs = orig.m_ninputs;
+  m_nelts_per_input = orig.m_nelts_per_input * factor;
+  m_encoding.new_vector (orig.m_encoding.full_nelts () * factor,
+			 orig.m_encoding.npatterns () * factor,
+			 orig.m_encoding.nelts_per_pattern ());
+  unsigned int encoded_nelts = orig.m_encoding.encoded_nelts ();
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
     {
-      element_type base = orig[i] * factor;
+      element_type base = orig.m_encoding[i] * factor;
       for (unsigned int j = 0; j < factor; ++j)
-	quick_push (base + j);
+	m_encoding.quick_push (base + j);
     }
+  m_encoding.finalize ();
+}
+
+/* Rotate the inputs of the permutation right by DELTA inputs.  This changes
+   the values of the permutation vector but it doesn't change the way that
+   the elements are encoded.  */
+
+void
+vec_perm_indices::rotate_inputs (int delta)
+{
+  element_type element_delta = delta * m_nelts_per_input;
+  for (unsigned int i = 0; i < m_encoding.length (); ++i)
+    m_encoding[i] = clamp (m_encoding[i] + element_delta);
 }
 
 /* Return true if all elements of the permutation vector are in the range
@@ -52,9 +91,44 @@  vec_perm_indices::new_expanded_vector (c
 bool
 vec_perm_indices::all_in_range_p (element_type start, element_type size) const
 {
-  for (unsigned int i = 0; i < length (); ++i)
-    if ((*this)[i] < start || ((*this)[i] - start) >= size)
+  /* Check the first two elements of each pattern.  */
+  unsigned int npatterns = m_encoding.npatterns ();
+  unsigned int nelts_per_pattern = m_encoding.nelts_per_pattern ();
+  unsigned int base_nelts = npatterns * MIN (nelts_per_pattern, 2);
+  for (unsigned int i = 0; i < base_nelts; ++i)
+    if (m_encoding[i] < start || (m_encoding[i] - start) >= size)
       return false;
+
+  /* For stepped encodings, check the full range of the series.  */
+  if (nelts_per_pattern == 3)
+    {
+      element_type limit = input_nelts ();
+
+      /* The number of elements in each pattern beyond the first two
+	 that we checked above.  */
+      unsigned int step_nelts = (m_encoding.full_nelts () / npatterns) - 2;
+      for (unsigned int i = 0; i < npatterns; ++i)
+	{
+	  /* BASE1 has been checked but BASE2 hasn't.   */
+	  element_type base1 = m_encoding[i + npatterns];
+	  element_type base2 = m_encoding[i + base_nelts];
+
+	  /* The step to add to get from BASE1 to each subsequent value.  */
+	  element_type step = clamp (base2 - base1);
+
+	  /* STEP has no inherent sign, so a value near LIMIT can
+	     act as a negative step.  The series is in range if it
+	     is in range according to one of the two interpretations.
+
+	     Since we're dealing with clamped values, ELEMENT_TYPE is
+	     wide enough for overflow not to be a problem.  */
+	  element_type headroom_down = base1 - start;
+	  element_type headroom_up = size - headroom_down - 1;
+	  if (headroom_up < step * step_nelts
+	      && headroom_down < (limit - step) * step_nelts)
+	    return false;
+	}
+    }
   return true;
 }
 
@@ -65,15 +139,16 @@  vec_perm_indices::all_in_range_p (elemen
 bool
 tree_to_vec_perm_builder (vec_perm_builder *builder, tree cst)
 {
-  unsigned int nelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst));
-  for (unsigned int i = 0; i < nelts; ++i)
-    if (!tree_fits_shwi_p (vector_cst_elt (cst, i)))
+  unsigned int encoded_nelts = vector_cst_encoded_nelts (cst);
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    if (!tree_fits_shwi_p (VECTOR_CST_ENCODED_ELT (cst, i)))
       return false;
 
-  builder->reserve (nelts);
-  for (unsigned int i = 0; i < nelts; ++i)
-    builder->quick_push (tree_to_shwi (vector_cst_elt (cst, i))
-			 & (2 * nelts - 1));
+  builder->new_vector (TYPE_VECTOR_SUBPARTS (TREE_TYPE (cst)),
+		       VECTOR_CST_NPATTERNS (cst),
+		       VECTOR_CST_NELTS_PER_PATTERN (cst));
+  for (unsigned int i = 0; i < encoded_nelts; ++i)
+    builder->quick_push (tree_to_shwi (VECTOR_CST_ENCODED_ELT (cst, i)));
   return true;
 }
 
Index: gcc/optabs.c
===================================================================
--- gcc/optabs.c	2017-12-09 22:47:27.881318099 +0000
+++ gcc/optabs.c	2017-12-09 22:48:47.546825312 +0000
@@ -5456,6 +5456,11 @@  expand_vec_perm_const (machine_mode mode
   rtx_insn *last = get_last_insn ();
 
   bool single_arg_p = rtx_equal_p (v0, v1);
+  /* Always specify two input vectors here and leave the target to handle
+     cases in which the inputs are equal.  Not all backends can cope with
+     the single-input representation when testing for a double-input
+     target instruction.  */
+  vec_perm_indices indices (sel, 2, GET_MODE_NUNITS (mode));
 
   /* See if this can be handled with a vec_shr.  We only do this if the
      second vector is all zeroes.  */
@@ -5468,7 +5473,7 @@  expand_vec_perm_const (machine_mode mode
       && (shift_code != CODE_FOR_nothing
 	  || shift_code_qi != CODE_FOR_nothing))
     {
-      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, sel);
+      rtx shift_amt = shift_amt_for_vec_perm_mask (mode, indices);
       if (shift_amt)
 	{
 	  struct expand_operand ops[3];
@@ -5500,7 +5505,7 @@  expand_vec_perm_const (machine_mode mode
       else
 	v1 = force_reg (mode, v1);
 
-      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, sel))
+      if (targetm.vectorize.vec_perm_const (mode, target, v0, v1, indices))
 	return target;
     }
 
@@ -5509,7 +5514,7 @@  expand_vec_perm_const (machine_mode mode
   rtx target_qi = NULL_RTX, v0_qi = NULL_RTX, v1_qi = NULL_RTX;
   if (qimode != VOIDmode)
     {
-      qimode_indices.new_expanded_vector (sel, GET_MODE_UNIT_SIZE (mode));
+      qimode_indices.new_expanded_vector (indices, GET_MODE_UNIT_SIZE (mode));
       target_qi = gen_reg_rtx (qimode);
       v0_qi = gen_lowpart (qimode, v0);
       v1_qi = gen_lowpart (qimode, v1);
@@ -5536,7 +5541,7 @@  expand_vec_perm_const (machine_mode mode
      REQUIRED_SEL_MODE is OK.  */
   if (sel_mode != required_sel_mode)
     {
-      if (!selector_fits_mode_p (required_sel_mode, sel))
+      if (!selector_fits_mode_p (required_sel_mode, indices))
 	{
 	  delete_insns_since (last);
 	  return NULL_RTX;
@@ -5547,7 +5552,7 @@  expand_vec_perm_const (machine_mode mode
   insn_code icode = direct_optab_handler (vec_perm_optab, mode);
   if (icode != CODE_FOR_nothing)
     {
-      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, sel);
+      rtx sel_rtx = vec_perm_indices_to_rtx (sel_mode, indices);
       rtx tmp = expand_vec_perm_1 (icode, target, v0, v1, sel_rtx);
       if (tmp)
 	return tmp;
@@ -5621,7 +5626,7 @@  expand_vec_perm_var (machine_mode mode,
   gcc_assert (sel != NULL);
 
   /* Broadcast the low byte each element into each of its bytes.  */
-  vec_perm_builder const_sel (w);
+  vec_perm_builder const_sel (w, w, 1);
   for (i = 0; i < w; ++i)
     {
       int this_e = i / u * u;
@@ -5848,7 +5853,7 @@  expand_mult_highpart (machine_mode mode,
   expand_insn (optab_handler (tab2, mode), 3, eops);
   m2 = gen_lowpart (mode, eops[0].value);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   if (method == 2)
     {
       for (i = 0; i < nunits; ++i)
Index: gcc/optabs-query.c
===================================================================
--- gcc/optabs-query.c	2017-12-09 22:47:27.881318099 +0000
+++ gcc/optabs-query.c	2017-12-09 22:48:47.545825268 +0000
@@ -501,12 +501,13 @@  can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_odd_optab : vec_widen_smult_odd_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (!BYTES_BIG_ENDIAN
 			    + (i & ~1)
 			    + ((i & 1) ? nunits : 0));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 2;
 	}
     }
@@ -517,10 +518,11 @@  can_mult_highpart_p (machine_mode mode,
       op = uns_p ? vec_widen_umult_lo_optab : vec_widen_smult_lo_optab;
       if (optab_handler (op, mode) != CODE_FOR_nothing)
 	{
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (2 * i + (BYTES_BIG_ENDIAN ? 0 : 1));
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nunits);
+	  if (can_vec_perm_const_p (mode, indices))
 	    return 3;
 	}
     }
Index: gcc/fold-const.c
===================================================================
--- gcc/fold-const.c	2017-12-09 22:47:27.881318099 +0000
+++ gcc/fold-const.c	2017-12-09 22:48:47.545825268 +0000
@@ -11217,7 +11217,7 @@  fold_ternary_loc (location_t loc, enum t
 	    {
 	      unsigned int nelts = VECTOR_CST_NELTS (arg0), i;
 	      gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	      auto_vec_perm_indices sel (nelts);
+	      vec_perm_builder sel (nelts, nelts, 1);
 	      for (i = 0; i < nelts; i++)
 		{
 		  tree val = VECTOR_CST_ELT (arg0, i);
@@ -11228,7 +11228,8 @@  fold_ternary_loc (location_t loc, enum t
 		  else /* Currently unreachable.  */
 		    return NULL_TREE;
 		}
-	      tree t = fold_vec_perm (type, arg1, arg2, sel);
+	      tree t = fold_vec_perm (type, arg1, arg2,
+				      vec_perm_indices (sel, 2, nelts));
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11558,8 +11559,8 @@  fold_ternary_loc (location_t loc, enum t
 	  mask2 = 2 * nelts - 1;
 	  mask = single_arg ? (nelts - 1) : mask2;
 	  gcc_assert (nelts == TYPE_VECTOR_SUBPARTS (type));
-	  auto_vec_perm_indices sel (nelts);
-	  auto_vec_perm_indices sel2 (nelts);
+	  vec_perm_builder sel (nelts, nelts, 1);
+	  vec_perm_builder sel2 (nelts, nelts, 1);
 	  for (i = 0; i < nelts; i++)
 	    {
 	      tree val = VECTOR_CST_ELT (arg2, i);
@@ -11604,12 +11605,13 @@  fold_ternary_loc (location_t loc, enum t
 	      need_mask_canon = true;
 	    }
 
+	  vec_perm_indices indices (sel, 2, nelts);
 	  if ((TREE_CODE (op0) == VECTOR_CST
 	       || TREE_CODE (op0) == CONSTRUCTOR)
 	      && (TREE_CODE (op1) == VECTOR_CST
 		  || TREE_CODE (op1) == CONSTRUCTOR))
 	    {
-	      tree t = fold_vec_perm (type, op0, op1, sel);
+	      tree t = fold_vec_perm (type, op0, op1, indices);
 	      if (t != NULL_TREE)
 		return t;
 	    }
@@ -11621,11 +11623,14 @@  fold_ternary_loc (location_t loc, enum t
 	     argument permutation while still allowing an equivalent
 	     2-argument version.  */
 	  if (need_mask_canon && arg2 == op2
-	      && !can_vec_perm_const_p (TYPE_MODE (type), sel, false)
-	      && can_vec_perm_const_p (TYPE_MODE (type), sel2, false))
+	      && !can_vec_perm_const_p (TYPE_MODE (type), indices, false)
+	      && can_vec_perm_const_p (TYPE_MODE (type),
+				       vec_perm_indices (sel2, 2, nelts),
+				       false))
 	    {
 	      need_mask_canon = need_mask_canon2;
-	      sel = sel2;
+	      sel.truncate (0);
+	      sel.splice (sel2);
 	    }
 
 	  if (need_mask_canon && arg2 == op2)
Index: gcc/tree-ssa-forwprop.c
===================================================================
--- gcc/tree-ssa-forwprop.c	2017-12-09 22:47:27.883318100 +0000
+++ gcc/tree-ssa-forwprop.c	2017-12-09 22:48:47.546825312 +0000
@@ -2019,7 +2019,7 @@  simplify_vector_constructor (gimple_stmt
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
 
-  auto_vec_perm_indices sel (nelts);
+  vec_perm_builder sel (nelts, nelts, 1);
   orig = NULL;
   conv_code = ERROR_MARK;
   maybe_ident = true;
@@ -2109,7 +2109,8 @@  simplify_vector_constructor (gimple_stmt
     {
       tree mask_type;
 
-      if (!can_vec_perm_const_p (TYPE_MODE (type), sel))
+      vec_perm_indices indices (sel, 1, nelts);
+      if (!can_vec_perm_const_p (TYPE_MODE (type), indices))
 	return false;
       mask_type
 	= build_vector_type (build_nonstandard_integer_type (elem_size, 1),
Index: gcc/tree-vect-data-refs.c
===================================================================
--- gcc/tree-vect-data-refs.c	2017-12-09 22:47:27.883318100 +0000
+++ gcc/tree-vect-data-refs.c	2017-12-09 22:48:47.546825312 +0000
@@ -4566,7 +4566,7 @@  vect_grouped_store_supported (tree vecty
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
@@ -4574,6 +4574,7 @@  vect_grouped_store_supported (tree vecty
 	  unsigned int j0 = 0, j1 = 0, j2 = 0;
 	  unsigned int i, j;
 
+	  vec_perm_indices indices;
 	  for (j = 0; j < 3; j++)
 	    {
 	      int nelt0 = ((3 - j) * nelt) % 3;
@@ -4588,7 +4589,8 @@  vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = 0;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4605,7 +4607,8 @@  vect_grouped_store_supported (tree vecty
 		  if (3 * i + nelt2 < nelt)
 		    sel[3 * i + nelt2] = nelt + j2++;
 		}
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf (MSG_MISSED_OPTIMIZATION,
@@ -4625,11 +4628,13 @@  vect_grouped_store_supported (tree vecty
 	      sel[i * 2] = i;
 	      sel[i * 2 + 1] = i + nelt;
 	    }
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] += nelt / 2;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
 	}
@@ -4731,7 +4736,7 @@  vect_permute_store_chain (vec<tree> dr_c
   unsigned int i, n, log_length = exact_log2 (length);
   unsigned int j, nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -4742,6 +4747,7 @@  vect_permute_store_chain (vec<tree> dr_c
     {
       unsigned int j0 = 0, j1 = 0, j2 = 0;
 
+      vec_perm_indices indices;
       for (j = 0; j < 3; j++)
         {
 	  int nelt0 = ((3 - j) * nelt) % 3;
@@ -4757,7 +4763,8 @@  vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = 0;
 	    }
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0; i < nelt; i++)
 	    {
@@ -4768,7 +4775,8 @@  vect_permute_store_chain (vec<tree> dr_c
 	      if (3 * i + nelt2 < nelt)
 		sel[3 * i + nelt2] = nelt + j2++;
 	    }
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  vect1 = dr_chain[0];
 	  vect2 = dr_chain[1];
@@ -4805,11 +4813,13 @@  vect_permute_store_chain (vec<tree> dr_c
 	  sel[i * 2] = i;
 	  sel[i * 2 + 1] = i + nelt;
 	}
-	perm_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	vec_perm_indices indices (sel, 2, nelt);
+	perm_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0; i < nelt; i++)
 	  sel[i] += nelt / 2;
-	perm_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	indices.new_vector (sel, 2, nelt);
+	perm_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	for (i = 0, n = log_length; i < n; i++)
 	  {
@@ -5154,11 +5164,12 @@  vect_grouped_load_supported (tree vectyp
   if (VECTOR_MODE_P (mode))
     {
       unsigned int i, j, nelt = GET_MODE_NUNITS (mode);
-      auto_vec_perm_indices sel (nelt);
+      vec_perm_builder sel (nelt, nelt, 1);
       sel.quick_grow (nelt);
 
       if (count == 3)
 	{
+	  vec_perm_indices indices;
 	  unsigned int k;
 	  for (k = 0; k < 3; k++)
 	    {
@@ -5167,7 +5178,8 @@  vect_grouped_load_supported (tree vectyp
 		  sel[i] = 3 * i + k;
 		else
 		  sel[i] = 0;
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5180,7 +5192,8 @@  vect_grouped_load_supported (tree vectyp
 		  sel[i] = i;
 		else
 		  sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-	      if (!can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5195,13 +5208,16 @@  vect_grouped_load_supported (tree vectyp
 	{
 	  /* If length is not equal to 3 then only power of 2 is supported.  */
 	  gcc_assert (pow2p_hwi (count));
+
 	  for (i = 0; i < nelt; i++)
 	    sel[i] = i * 2;
-	  if (can_vec_perm_const_p (mode, sel))
+	  vec_perm_indices indices (sel, 2, nelt);
+	  if (can_vec_perm_const_p (mode, indices))
 	    {
 	      for (i = 0; i < nelt; i++)
 		sel[i] = i * 2 + 1;
-	      if (can_vec_perm_const_p (mode, sel))
+	      indices.new_vector (sel, 2, nelt);
+	      if (can_vec_perm_const_p (mode, indices))
 		return true;
 	    }
         }
@@ -5316,7 +5332,7 @@  vect_permute_load_chain (vec<tree> dr_ch
   unsigned int i, j, log_length = exact_log2 (length);
   unsigned nelt = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5327,6 +5343,7 @@  vect_permute_load_chain (vec<tree> dr_ch
     {
       unsigned int k;
 
+      vec_perm_indices indices;
       for (k = 0; k < 3; k++)
 	{
 	  for (i = 0; i < nelt; i++)
@@ -5334,15 +5351,16 @@  vect_permute_load_chain (vec<tree> dr_ch
 	      sel[i] = 3 * i + k;
 	    else
 	      sel[i] = 0;
-	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_low = vect_gen_perm_mask_checked (vectype, indices);
 
 	  for (i = 0, j = 0; i < nelt; i++)
 	    if (3 * i + k < 2 * nelt)
 	      sel[i] = i;
 	    else
 	      sel[i] = nelt + ((nelt + k) % 3) + 3 * (j++);
-
-	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, sel);
+	  indices.new_vector (sel, 2, nelt);
+	  perm3_mask_high = vect_gen_perm_mask_checked (vectype, indices);
 
 	  first_vect = dr_chain[0];
 	  second_vect = dr_chain[1];
@@ -5374,11 +5392,13 @@  vect_permute_load_chain (vec<tree> dr_ch
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2;
-      perm_mask_even = vect_gen_perm_mask_checked (vectype, sel);
+      vec_perm_indices indices (sel, 2, nelt);
+      perm_mask_even = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt; ++i)
 	sel[i] = i * 2 + 1;
-      perm_mask_odd = vect_gen_perm_mask_checked (vectype, sel);
+      indices.new_vector (sel, 2, nelt);
+      perm_mask_odd = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5514,7 +5534,7 @@  vect_shift_permute_load_chain (vec<tree>
   stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
   loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
 
-  auto_vec_perm_indices sel (nelt);
+  vec_perm_builder sel (nelt, nelt, 1);
   sel.quick_grow (nelt);
 
   result_chain->quick_grow (length);
@@ -5528,7 +5548,8 @@  vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i * 2;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2 + 1;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5536,13 +5557,14 @@  vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask1 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask1 = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < nelt / 2; ++i)
 	sel[i] = i * 2 + 1;
       for (i = 0; i < nelt / 2; ++i)
 	sel[nelt / 2 + i] = i * 2;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5550,20 +5572,21 @@  vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm2_mask2 = vect_gen_perm_mask_checked (vectype, sel);
+      perm2_mask2 = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {4 5 6 7 8 9 10 11}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = nelt / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to select vector from 2.
 	 For vector length 8 it is {0 1 2 3 12 13 14 15}.  */
@@ -5571,14 +5594,15 @@  vect_shift_permute_load_chain (vec<tree>
 	sel[i] = i;
       for (i = nelt / 2; i < nelt; i++)
 	sel[i] = nelt + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "select is not supported by target\n");
 	  return false;
 	}
-      select_mask = vect_gen_perm_mask_checked (vectype, sel);
+      select_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (i = 0; i < log_length; i++)
 	{
@@ -5634,7 +5658,8 @@  vect_shift_permute_load_chain (vec<tree>
 	  sel[i] = 3 * k + (l % 3);
 	  k++;
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
@@ -5642,59 +5667,63 @@  vect_shift_permute_load_chain (vec<tree>
 			      supported by target\n");
 	  return false;
 	}
-      perm3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      perm3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {6 7 8 9 10 11 12 13}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift1_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift1_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + 1 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift2_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift2_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {3 4 5 6 7 8 9 10}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift3_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift3_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       /* Generating permutation constant to shift all elements.
 	 For vector length 8 it is {5 6 7 8 9 10 11 12}.  */
       for (i = 0; i < nelt; i++)
 	sel[i] = 2 * (nelt / 3) + (nelt % 3) / 2 + i;
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  if (dump_enabled_p ())
 	    dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
 			     "shift permutation is not supported by target\n");
 	  return false;
 	}
-      shift4_mask = vect_gen_perm_mask_checked (vectype, sel);
+      shift4_mask = vect_gen_perm_mask_checked (vectype, indices);
 
       for (k = 0; k < 3; k++)
 	{
Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c	2017-12-09 22:47:27.884318101 +0000
+++ gcc/tree-vect-slp.c	2017-12-09 22:48:47.547825355 +0000
@@ -894,7 +894,7 @@  vect_build_slp_tree_1 (vec_info *vinfo,
       && TREE_CODE_CLASS (alt_stmt_code) != tcc_reference)
     {
       unsigned int count = TYPE_VECTOR_SUBPARTS (vectype);
-      auto_vec_perm_indices sel (count);
+      vec_perm_builder sel (count, count, 1);
       for (i = 0; i < count; ++i)
 	{
 	  unsigned int elt = i;
@@ -902,7 +902,8 @@  vect_build_slp_tree_1 (vec_info *vinfo,
 	    elt += count;
 	  sel.quick_push (elt);
 	}
-      if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+      vec_perm_indices indices (sel, 2, count);
+      if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
 	{
 	  for (i = 0; i < group_size; ++i)
 	    if (gimple_assign_rhs_code (stmts[i]) == alt_stmt_code)
@@ -3570,8 +3571,9 @@  vect_transform_slp_perm_load (slp_tree n
     (int_mode_for_mode (TYPE_MODE (TREE_TYPE (vectype))).require (), 1);
   mask_type = get_vectype_for_scalar_type (mask_element_type);
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
-  auto_vec_perm_indices mask (nunits);
+  vec_perm_builder mask (nunits, nunits, 1);
   mask.quick_grow (nunits);
+  vec_perm_indices indices;
 
   /* Initialize the vect stmts of NODE to properly insert the generated
      stmts later.  */
@@ -3644,10 +3646,10 @@  vect_transform_slp_perm_load (slp_tree n
 	    noop_p = false;
 	  mask[index++] = mask_element;
 
-	  if (index == nunits)
+	  if (index == nunits && !noop_p)
 	    {
-	      if (! noop_p
-		  && ! can_vec_perm_const_p (mode, mask))
+	      indices.new_vector (mask, 2, nunits);
+	      if (!can_vec_perm_const_p (mode, indices))
 		{
 		  if (dump_enabled_p ())
 		    {
@@ -3655,16 +3657,19 @@  vect_transform_slp_perm_load (slp_tree n
 				       vect_location, 
 				       "unsupported vect permute { ");
 		      for (i = 0; i < nunits; ++i)
-			dump_printf (MSG_MISSED_OPTIMIZATION, "%d ", mask[i]);
+			dump_printf (MSG_MISSED_OPTIMIZATION,
+				     HOST_WIDE_INT_PRINT_DEC " ", mask[i]);
 		      dump_printf (MSG_MISSED_OPTIMIZATION, "}\n");
 		    }
 		  gcc_assert (analyze_only);
 		  return false;
 		}
 
-	      if (! noop_p)
-		++*n_perms;
+	      ++*n_perms;
+	    }
 
+	  if (index == nunits)
+	    {
 	      if (!analyze_only)
 		{
 		  tree mask_vec = NULL_TREE;
@@ -3797,7 +3802,7 @@  vect_schedule_slp_instance (slp_tree nod
       enum tree_code code0 = gimple_assign_rhs_code (stmt);
       enum tree_code ocode = ERROR_MARK;
       gimple *ostmt;
-      auto_vec_perm_indices mask (group_size);
+      vec_perm_builder mask (group_size, group_size, 1);
       FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, ostmt)
 	if (gimple_assign_rhs_code (ostmt) != code0)
 	  {
Index: gcc/tree-vect-stmts.c
===================================================================
--- gcc/tree-vect-stmts.c	2017-12-09 22:47:27.885318101 +0000
+++ gcc/tree-vect-stmts.c	2017-12-09 22:48:47.548825399 +0000
@@ -1717,13 +1717,14 @@  perm_mask_for_reverse (tree vectype)
 
   nunits = TYPE_VECTOR_SUBPARTS (vectype);
 
-  auto_vec_perm_indices sel (nunits);
+  vec_perm_builder sel (nunits, nunits, 1);
   for (i = 0; i < nunits; ++i)
     sel.quick_push (nunits - 1 - i);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (vectype), sel))
+  vec_perm_indices indices (sel, 1, nunits);
+  if (!can_vec_perm_const_p (TYPE_MODE (vectype), indices))
     return NULL_TREE;
-  return vect_gen_perm_mask_checked (vectype, sel);
+  return vect_gen_perm_mask_checked (vectype, indices);
 }
 
 /* A subroutine of get_load_store_type, with a subset of the same
@@ -2185,27 +2186,32 @@  vectorizable_mask_load_store (gimple *st
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  sel.quick_grow (nunits);
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i < gather_off_nunits
 		     ? i : i + nunits - gather_off_nunits;
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
 	  ncopies *= 2;
+
 	  for (i = 0; i < nunits; ++i)
 	    sel[i] = i | gather_off_nunits;
-	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, sel);
+	  indices.new_vector (sel, 2, gather_off_nunits);
+	  mask_perm_mask = vect_gen_perm_mask_checked (masktype, indices);
 	}
       else
 	gcc_unreachable ();
@@ -2498,12 +2504,13 @@  vectorizable_bswap (gimple *stmt, gimple
   unsigned int num_bytes = TYPE_VECTOR_SUBPARTS (char_vectype);
   unsigned word_bytes = num_bytes / nunits;
 
-  auto_vec_perm_indices elts (num_bytes);
+  vec_perm_builder elts (num_bytes, num_bytes, 1);
   for (unsigned i = 0; i < nunits; ++i)
     for (unsigned j = 0; j < word_bytes; ++j)
       elts.quick_push ((i + 1) * word_bytes - j - 1);
 
-  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), elts))
+  vec_perm_indices indices (elts, 1, num_bytes);
+  if (!can_vec_perm_const_p (TYPE_MODE (char_vectype), indices))
     return false;
 
   if (! vec_stmt)
@@ -5809,22 +5816,25 @@  vectorizable_store (gimple *stmt, gimple
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (scatter_off_nunits);
+	  vec_perm_builder sel (scatter_off_nunits, scatter_off_nunits, 1);
 	  for (i = 0; i < (unsigned int) scatter_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, scatter_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	}
       else if (nunits == (unsigned int) scatter_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < (unsigned int) nunits; ++i)
 	    sel.quick_push (i | scatter_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  gcc_assert (perm_mask != NULL_TREE);
 	  ncopies *= 2;
 	}
@@ -6845,22 +6855,25 @@  vectorizable_load (gimple *stmt, gimple_
 	{
 	  modifier = WIDEN;
 
-	  auto_vec_perm_indices sel (gather_off_nunits);
+	  vec_perm_builder sel (gather_off_nunits, gather_off_nunits, 1);
 	  for (i = 0; i < gather_off_nunits; ++i)
 	    sel.quick_push (i | nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype, sel);
+	  vec_perm_indices indices (sel, 1, gather_off_nunits);
+	  perm_mask = vect_gen_perm_mask_checked (gs_info.offset_vectype,
+						  indices);
 	}
       else if (nunits == gather_off_nunits * 2)
 	{
 	  modifier = NARROW;
 
-	  auto_vec_perm_indices sel (nunits);
+	  vec_perm_builder sel (nunits, nunits, 1);
 	  for (i = 0; i < nunits; ++i)
 	    sel.quick_push (i < gather_off_nunits
 			    ? i : i + nunits - gather_off_nunits);
 
-	  perm_mask = vect_gen_perm_mask_checked (vectype, sel);
+	  vec_perm_indices indices (sel, 2, nunits);
+	  perm_mask = vect_gen_perm_mask_checked (vectype, indices);
 	  ncopies *= 2;
 	}
       else
Index: gcc/tree-vect-generic.c
===================================================================
--- gcc/tree-vect-generic.c	2017-12-09 22:47:27.883318100 +0000
+++ gcc/tree-vect-generic.c	2017-12-09 22:48:47.547825355 +0000
@@ -1299,15 +1299,13 @@  lower_vec_perm (gimple_stmt_iterator *gs
 	mask = gimple_assign_rhs1 (def_stmt);
     }
 
-  if (TREE_CODE (mask) == VECTOR_CST)
-    {
-      auto_vec_perm_indices sel_int (elements);
-
-      for (i = 0; i < elements; ++i)
-	sel_int.quick_push (TREE_INT_CST_LOW (VECTOR_CST_ELT (mask, i))
-			    & (2 * elements - 1));
+  vec_perm_builder sel_int;
 
-      if (can_vec_perm_const_p (TYPE_MODE (vect_type), sel_int))
+  if (TREE_CODE (mask) == VECTOR_CST
+      && tree_to_vec_perm_builder (&sel_int, mask))
+    {
+      vec_perm_indices indices (sel_int, 2, elements);
+      if (can_vec_perm_const_p (TYPE_MODE (vect_type), indices))
 	{
 	  gimple_assign_set_rhs3 (stmt, mask);
 	  update_stmt (stmt);
@@ -1319,14 +1317,14 @@  lower_vec_perm (gimple_stmt_iterator *gs
 	  != CODE_FOR_nothing
 	  && TREE_CODE (vec1) == VECTOR_CST
 	  && initializer_zerop (vec1)
-	  && sel_int[0]
-	  && sel_int[0] < elements)
+	  && indices[0]
+	  && indices[0] < elements)
 	{
 	  for (i = 1; i < elements; ++i)
 	    {
-	      unsigned int expected = i + sel_int[0];
+	      unsigned int expected = i + indices[0];
 	      /* Indices into the second vector are all equivalent.  */
-	      if (MIN (elements, (unsigned) sel_int[i])
+	      if (MIN (elements, (unsigned) indices[i])
 		  != MIN (elements, expected))
  		break;
 	    }
Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c	2017-12-09 22:47:27.884318101 +0000
+++ gcc/tree-vect-loop.c	2017-12-09 22:48:47.547825355 +0000
@@ -3714,12 +3714,11 @@  vect_estimate_min_profitable_iters (loop
    vector elements (not bits) for a vector with NELT elements.  */
 static void
 calc_vec_perm_mask_for_shift (unsigned int offset, unsigned int nelt,
-			      vec_perm_indices *sel)
+			      vec_perm_builder *sel)
 {
-  unsigned int i;
-
-  for (i = 0; i < nelt; i++)
-    sel->quick_push ((i + offset) & (2 * nelt - 1));
+  sel->new_vector (nelt, nelt, 1);
+  for (unsigned int i = 0; i < nelt; i++)
+    sel->quick_push (i + offset);
 }
 
 /* Checks whether the target supports whole-vector shifts for vectors of mode
@@ -3732,13 +3731,13 @@  have_whole_vector_shift (machine_mode mo
     return true;
 
   unsigned int i, nelt = GET_MODE_NUNITS (mode);
-  auto_vec_perm_indices sel (nelt);
-
+  vec_perm_builder sel;
+  vec_perm_indices indices;
   for (i = nelt/2; i >= 1; i/=2)
     {
-      sel.truncate (0);
       calc_vec_perm_mask_for_shift (i, nelt, &sel);
-      if (!can_vec_perm_const_p (mode, sel, false))
+      indices.new_vector (sel, 2, nelt);
+      if (!can_vec_perm_const_p (mode, indices, false))
 	return false;
     }
   return true;
@@ -5028,7 +5027,8 @@  vect_create_epilog_for_reduction (vec<tr
       if (reduce_with_shift && !slp_reduc)
         {
           int nelements = vec_size_in_bits / element_bitsize;
-          auto_vec_perm_indices sel (nelements);
+	  vec_perm_builder sel;
+	  vec_perm_indices indices;
 
           int elt_offset;
 
@@ -5052,9 +5052,9 @@  vect_create_epilog_for_reduction (vec<tr
                elt_offset >= 1;
                elt_offset /= 2)
             {
-	      sel.truncate (0);
 	      calc_vec_perm_mask_for_shift (elt_offset, nelements, &sel);
-	      tree mask = vect_gen_perm_mask_any (vectype, sel);
+	      indices.new_vector (sel, 2, nelements);
+	      tree mask = vect_gen_perm_mask_any (vectype, indices);
 	      epilog_stmt = gimple_build_assign (vec_dest, VEC_PERM_EXPR,
 						 new_temp, zero_vec, mask);
               new_name = make_ssa_name (vec_dest, epilog_stmt);
Index: gcc/config/aarch64/aarch64.c
===================================================================
--- gcc/config/aarch64/aarch64.c	2017-12-09 22:47:27.856318084 +0000
+++ gcc/config/aarch64/aarch64.c	2017-12-09 22:48:47.535824832 +0000
@@ -13208,7 +13208,7 @@  #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -13598,10 +13598,7 @@  aarch64_expand_vec_perm_const_1 (struct
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      gcc_assert (nelt == (nelt & -nelt));
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] ^= nelt; /* Keep the same index, but in the other vector.  */
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -13641,12 +13638,10 @@  aarch64_vectorize_vec_perm_const (machin
 
   /* Calculate whether all elements are in one vector.  */
   unsigned int nelt = sel.length ();
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       unsigned int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -13665,8 +13660,6 @@  aarch64_vectorize_vec_perm_const (machin
 	 input vector.  */
       /* Fall Through.  */
     case 2:
-      for (i = 0; i < nelt; ++i)
-	d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -13677,6 +13670,8 @@  aarch64_vectorize_vec_perm_const (machin
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (!d.testing_p)
     return aarch64_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/arm/arm.c
===================================================================
--- gcc/config/arm/arm.c	2017-12-09 22:47:27.858318085 +0000
+++ gcc/config/arm/arm.c	2017-12-09 22:48:47.538824963 +0000
@@ -28852,7 +28852,7 @@  #define MAX_VECT_LEN 16
 struct expand_vec_perm_d
 {
   rtx target, op0, op1;
-  auto_vec_perm_indices perm;
+  vec_perm_indices perm;
   machine_mode vmode;
   bool one_vector_p;
   bool testing_p;
@@ -29360,9 +29360,7 @@  arm_expand_vec_perm_const_1 (struct expa
   unsigned int nelt = d->perm.length ();
   if (d->perm[0] >= nelt)
     {
-      for (unsigned int i = 0; i < nelt; ++i)
-	d->perm[i] = (d->perm[i] + nelt) & (2 * nelt - 1);
-
+      d->perm.rotate_inputs (1);
       std::swap (d->op0, d->op1);
     }
 
@@ -29402,12 +29400,10 @@  arm_vectorize_vec_perm_const (machine_mo
   d.testing_p = !target;
 
   nelt = GET_MODE_NUNITS (d.vmode);
-  d.perm.reserve (nelt);
   for (i = which = 0; i < nelt; ++i)
     {
       int ei = sel[i] & (2 * nelt - 1);
       which |= (ei < nelt ? 1 : 2);
-      d.perm.quick_push (ei);
     }
 
   switch (which)
@@ -29426,8 +29422,6 @@  arm_vectorize_vec_perm_const (machine_mo
 	 input vector.  */
       /* FALLTHRU */
     case 2:
-      for (i = 0; i < nelt; ++i)
-        d.perm[i] &= nelt - 1;
       d.op0 = op1;
       d.one_vector_p = true;
       break;
@@ -29438,6 +29432,8 @@  arm_vectorize_vec_perm_const (machine_mo
       break;
     }
 
+  d.perm.new_vector (sel.encoding (), d.one_vector_p ? 1 : 2, nelt);
+
   if (d.testing_p)
     return arm_expand_vec_perm_const_1 (&d);
 
Index: gcc/config/powerpcspe/powerpcspe.c
===================================================================
--- gcc/config/powerpcspe/powerpcspe.c	2017-12-09 22:47:27.871318093 +0000
+++ gcc/config/powerpcspe/powerpcspe.c	2017-12-09 22:48:47.541825094 +0000
@@ -38780,7 +38780,7 @@  rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -38795,7 +38795,7 @@  rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)
Index: gcc/config/rs6000/rs6000.c
===================================================================
--- gcc/config/rs6000/rs6000.c	2017-12-09 22:47:27.874318095 +0000
+++ gcc/config/rs6000/rs6000.c	2017-12-09 22:48:47.544825224 +0000
@@ -36017,7 +36017,7 @@  rs6000_expand_extract_even (rtx target,
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   for (i = 0; i < nelt; i++)
     perm.quick_push (i * 2);
@@ -36032,7 +36032,7 @@  rs6000_expand_interleave (rtx target, rt
 {
   machine_mode vmode = GET_MODE (target);
   unsigned i, high, nelt = GET_MODE_NUNITS (vmode);
-  vec_perm_builder perm (nelt);
+  vec_perm_builder perm (nelt, nelt, 1);
 
   high = (highp ? 0 : nelt / 2);
   for (i = 0; i < nelt / 2; i++)