RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

Richard Biener Wed, 04 Jun 2025 01:23:06 -0700

On Wed, 4 Jun 2025, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Wednesday, June 4, 2025 8:34 AM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Sandiford
> > <richard.sandif...@arm.com>; Pengfei Li <pengfei....@arm.com>; gcc-
> > patc...@gcc.gnu.org; ktkac...@nvidia.com
> > Subject: RE: [PATCH] vect: Improve vectorization for small-trip-count loops 
> > using
> > subvectors
> > 
> > On Wed, 4 Jun 2025, Tamar Christina wrote:
> > 
> > > > -----Original Message-----
> > > > From: Richard Biener <rguent...@suse.de>
> > > > Sent: Wednesday, June 4, 2025 8:04 AM
> > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > Cc: Richard Biener <richard.guent...@gmail.com>; Richard Sandiford
> > > > <richard.sandif...@arm.com>; Pengfei Li <pengfei....@arm.com>; gcc-
> > > > patc...@gcc.gnu.org; ktkac...@nvidia.com
> > > > Subject: RE: [PATCH] vect: Improve vectorization for small-trip-count 
> > > > loops
> > using
> > > > subvectors
> > > >
> > > > On Tue, 3 Jun 2025, Tamar Christina wrote:
> > > >
> > > > > > -----Original Message-----
> > > > > > From: Richard Biener <richard.guent...@gmail.com>
> > > > > > Sent: Tuesday, June 3, 2025 2:12 PM
> > > > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > > > Cc: Richard Biener <rguent...@suse.de>; Richard Sandiford
> > > > > > <richard.sandif...@arm.com>; Pengfei Li <pengfei....@arm.com>; gcc-
> > > > > > patc...@gcc.gnu.org; ktkac...@nvidia.com
> > > > > > Subject: Re: [PATCH] vect: Improve vectorization for 
> > > > > > small-trip-count loops
> > > > using
> > > > > > subvectors
> > > > > >
> > > > > > On Fri, May 9, 2025 at 4:05 PM Tamar Christina
> > <tamar.christ...@arm.com>
> > > > > > wrote:
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Richard Biener <rguent...@suse.de>
> > > > > > > > Sent: Friday, May 9, 2025 2:44 PM
> > > > > > > > To: Tamar Christina <tamar.christ...@arm.com>
> > > > > > > > Cc: Richard Sandiford <richard.sandif...@arm.com>; Pengfei Li
> > > > > > > > <pengfei....@arm.com>; gcc-patches@gcc.gnu.org;
> > ktkac...@nvidia.com
> > > > > > > > Subject: RE: [PATCH] vect: Improve vectorization for 
> > > > > > > > small-trip-count
> > loops
> > > > > > using
> > > > > > > > subvectors
> > > > > > > >
> > > > > > > > On Fri, 9 May 2025, Tamar Christina wrote:
> > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Richard Biener <rguent...@suse.de>
> > > > > > > > > > Sent: Friday, May 9, 2025 11:08 AM
> > > > > > > > > > To: Richard Sandiford <richard.sandif...@arm.com>
> > > > > > > > > > Cc: Pengfei Li <pengfei....@arm.com>; 
> > > > > > > > > > gcc-patches@gcc.gnu.org;
> > > > > > > > > > ktkac...@nvidia.com
> > > > > > > > > > Subject: Re: [PATCH] vect: Improve vectorization for 
> > > > > > > > > > small-trip-count
> > > > loops
> > > > > > > > using
> > > > > > > > > > subvectors
> > > > > > > > > >
> > > > > > > > > > On Fri, 9 May 2025, Richard Sandiford wrote:
> > > > > > > > > >
> > > > > > > > > > > Richard Biener <rguent...@suse.de> writes:
> > > > > > > > > > > > On Thu, 8 May 2025, Pengfei Li wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >> This patch improves the auto-vectorization for loops 
> > > > > > > > > > > >> with known
> > > > small
> > > > > > > > > > > >> trip counts by enabling the use of subvectors - bit 
> > > > > > > > > > > >> fields of
> > original
> > > > > > > > > > > >> wider vectors. A subvector must have the same vector 
> > > > > > > > > > > >> element
> > type
> > > > as
> > > > > > the
> > > > > > > > > > > >> original vector and enough bits for all vector 
> > > > > > > > > > > >> elements to be
> > > > processed
> > > > > > > > > > > >> in the loop. Using subvectors is beneficial because 
> > > > > > > > > > > >> machine
> > > > instructions
> > > > > > > > > > > >> operating on narrower vectors usually show better 
> > > > > > > > > > > >> performance.
> > > > > > > > > > > >>
> > > > > > > > > > > >> To enable this optimization, this patch introduces a 
> > > > > > > > > > > >> new target
> > > > hook.
> > > > > > > > > > > >> This hook allows the vectorizer to query the backend 
> > > > > > > > > > > >> for a
> > suitable
> > > > > > > > > > > >> subvector type given the original vector type and the 
> > > > > > > > > > > >> number of
> > > > > > elements
> > > > > > > > > > > >> to be processed in the small-trip-count loop. The 
> > > > > > > > > > > >> target hook
> > also
> > > > has a
> > > > > > > > > > > >> could_trap parameter to say if the subvector is 
> > > > > > > > > > > >> allowed to have
> > > > more
> > > > > > > > > > > >> bits than needed.
> > > > > > > > > > > >>
> > > > > > > > > > > >> This optimization is currently enabled for AArch64 
> > > > > > > > > > > >> only. Below
> > > > example
> > > > > > > > > > > >> shows how it uses AdvSIMD vectors as subvectors of SVE
> > vectors
> > > > for
> > > > > > > > > > > >> higher instruction throughput.
> > > > > > > > > > > >>
> > > > > > > > > > > >> Consider this loop operating on an array of 16-bit 
> > > > > > > > > > > >> integers:
> > > > > > > > > > > >>
> > > > > > > > > > > >>      for (int i = 0; i < 5; i++) {
> > > > > > > > > > > >>        a[i] = a[i] < 0 ? -a[i] : a[i];
> > > > > > > > > > > >>      }
> > > > > > > > > > > >>
> > > > > > > > > > > >> Before this patch, the generated AArch64 code would be:
> > > > > > > > > > > >>
> > > > > > > > > > > >>      ptrue   p7.h, vl5
> > > > > > > > > > > >>      ptrue   p6.b, all
> > > > > > > > > > > >>      ld1h    z31.h, p7/z, [x0]
> > > > > > > > > > > >>      abs     z31.h, p6/m, z31.h
> > > > > > > > > > > >>      st1h    z31.h, p7, [x0]
> > > > > > > > > > > >
> > > > > > > > > > > > p6.b has all lanes active - why is the abs then not
> > > > > > > > > > > > simply unmasked?
> > > > > > > > > > >
> > > > > > > > > > > There is no unpredicated abs for SVE.  The predicate has 
> > > > > > > > > > > to be there,
> > > > > > > > > > > and so expand introduces one even when the gimple stmt is
> > > > > > unconditional.
> > > > > > > > > > >
> > > > > > > > > > > >> After this patch, it is optimized to:
> > > > > > > > > > > >>
> > > > > > > > > > > >>      ptrue   p7.h, vl5
> > > > > > > > > > > >>      ld1h    z31.h, p7/z, [x0]
> > > > > > > > > > > >>      abs     v31.8h, v31.8h
> > > > > > > > > > > >>      st1h    z31.h, p7, [x0]
> > > > > > > > > > > >
> > > > > > > > > > > > Help me decipher this - I suppose z31 and v31 "overlap" 
> > > > > > > > > > > > in the
> > > > > > > > > > > > register file?  And z31 is a variable-length vector but
> > > > > > > > > > > > z31.8h is a 8 element fixed length vector?  How can we
> > > > > > > > > > >
> > > > > > > > > > > v31.8h, but otherwise yes.
> > > > > > > > > > >
> > > > > > > > > > > > end up with just 8 elements here?  From the upper 
> > > > > > > > > > > > interation
> > > > > > > > > > > > bound?
> > > > > > > > > > >
> > > > > > > > > > > Yeah.
> > > > > > > > > > >
> > > > > > > > > > > > I'm not sure why you need any target hook here.  It 
> > > > > > > > > > > > seems you
> > > > > > > > > > > > do already have suitable vector modes so why not just 
> > > > > > > > > > > > ask
> > > > > > > > > > > > for a suitable vector?  Is it because you need to have
> > > > > > > > > > > > that register overlap guarantee (otherwise you'd get
> > > > > > > > > > > > a move)?
> > > > > > > > > > >
> > > > > > > > > > > Yeah, the optimisation only makes sense for overlaid 
> > > > > > > > > > > vector
> > registers.
> > > > > > > > > > >
> > > > > > > > > > > > Why do we not simply use fixed-length SVE here in the 
> > > > > > > > > > > > first place?
> > > > > > > > > > >
> > > > > > > > > > > Fixed-length SVE is restricted to cases where the exact 
> > > > > > > > > > > runtime
> > length
> > > > > > > > > > > is known: the compile-time length is both a minimum and a
> > maximum.
> > > > > > > > > > > In contrast, the code above would work even for 256-bit 
> > > > > > > > > > > SVE.
> > > > > > > > > > >
> > > > > > > > > > > > To me doing this in this way in the vectorizer looks
> > > > > > > > > > > > somewhat out-of-place.
> > > > > > > > > > > >
> > > > > > > > > > > > That said, we already have unmasked ABS in the IL:
> > > > > > > > > > > >
> > > > > > > > > > > >   vect__1.6_15 = .MASK_LOAD (&a, 16B, { -1, -1, -1, -1, 
> > > > > > > > > > > > -1, 0, 0, 0,
> > 0,
> > > > 0,
> > > > > > > > > > > > 0, 0, 0, 0, 0, 0, ... }, { 0, ... });
> > > > > > > > > > > >   vect__2.7_16 = ABSU_EXPR <vect__1.6_15>;
> > > > > > > > > > > >   vect__3.8_17 = VIEW_CONVERT_EXPR<vector([8,8]) short
> > > > > > > > > > int>(vect__2.7_16);
> > > > > > > > > > > >   .MASK_STORE (&a, 16B, { -1, -1, -1, -1, -1, 0, 0, 0, 
> > > > > > > > > > > > 0, 0, 0, 0, 0, 0,
> > > > > > > > > > > > 0, 0, ... }, vect__3.8_17); [tail call]
> > > > > > > > > > > >
> > > > > > > > > > > > so what's missing here?  I suppose having a constant 
> > > > > > > > > > > > masked
> > ABSU
> > > > here
> > > > > > > > > > > > would allow RTL expansion to select a fixed-size mode?
> > > > > > > > > > > >
> > > > > > > > > > > > And the vectorizer could simply use the existing
> > > > > > > > > > > > related_vector_mode hook instead?
> > > > > > > > > > >
> > > > > > > > > > > I agree it's a bit awkward.  The problem is that we want 
> > > > > > > > > > > conflicting
> > > > > > > > > > > things.  On the one hand, it would make conceptual sense 
> > > > > > > > > > > to use
> > SVE
> > > > > > > > > > > instructions to provide conditional optabs for Advanced 
> > > > > > > > > > > SIMD
> > vector
> > > > > > modes.
> > > > > > > > > > > E.g. SVE's LD1W could act as a predicated load for an 
> > > > > > > > > > > Advanced
> > SIMD
> > > > > > > > > > > int32x4_t vector.  The main problem with that is that 
> > > > > > > > > > > Advanced
> > SIMD's
> > > > > > > > > > > native boolean vector type is an integer vector of 0s and 
> > > > > > > > > > > -1s, rather
> > > > > > > > > > > than an SVE predicate.  For some (native Advanced SIMD)
> > operations
> > > > we'd
> > > > > > > > > > > want one type of boolean, for some (SVE emulating Advanced
> > SIMD)
> > > > > > > > > > > operations we'd want the other type of boolean.
> > > > > > > > > > >
> > > > > > > > > > > The patch goes the other way and treats using Advanced 
> > > > > > > > > > > SIMD as
> > an
> > > > > > > > > > > optimisation for SVE loops.
> > > > > > > > > > >
> > > > > > > > > > > related_vector_mode suffers from the same problem.  If we 
> > > > > > > > > > > ask for
> > a
> > > > > > > > > > > vector mode of >=5 halfwords for a load or store, we want 
> > > > > > > > > > > the SVE
> > > > mode,
> > > > > > > > > > > since that can be conditional on an SVE predicate.  But 
> > > > > > > > > > > if we ask for
> > > > > > > > > > > a vector mode of >=5 halfwords for an integer absolute 
> > > > > > > > > > > operation,
> > > > > > > > > > > we want the Advanced SIMD mode.  So I suppose the new 
> > > > > > > > > > > hook is
> > > > > > effectively
> > > > > > > > > > > providing context.  Perhaps we could do that using an 
> > > > > > > > > > > extra
> > parameter
> > > > to
> > > > > > > > > > > related_vector_mode, if that seems better.
> > > > > > > > > > >
> > > > > > > > > > > It's somewhat difficult to recover this information after
> > vectorisation,
> > > > > > > > > > > since like you say, the statements are often 
> > > > > > > > > > > unconditional and
> > operate
> > > > > > > > > > > on all lanes.
> > > > > > > > > >
> > > > > > > > > > So it seems we want to query if there's a lowpart 
> > > > > > > > > > fixed-size vector
> > > > > > > > > > mode available for a given other mode.  It seems to me that 
> > > > > > > > > > we
> > > > > > > > > > should have a way to query for this already without having 
> > > > > > > > > > a new
> > > > > > > > > > target hook using general code?
> > > > > > > >
> > > > > > > > So any answer to this?  You should be able to iterate over all
> > > > > > > > vector modes, look for those with fixed size and fitting the
> > > > > > > > lane constraint and then asking whether the modes are tieable
> > > > > > > > or whatever else is the correct way to verify the constraint?
> > > > > > > >
> > > > > > > > So sth as simple as
> > > > > > > >
> > > > > > > >  mode = mode_for_vector (GET_MODE_INNER (vmode), ceil_pow2
> > (const-
> > > > > > > > nunits));
> > > > > > > >  if (targetm.modes_tieable_p (mode, vmode))
> > > > > > > >    return mode;
> > > > > > > >
> > > > > > > > ?  Why do we need a target hook for this?  What's the "hidden"
> > > > > > > > constraint I'm missing?
> > > > > > > >
> > > > > > >
> > > > > > > Richard can correct me if I'm wrong (probably) but the problem 
> > > > > > > with this
> > > > > > > is that it won't work with VLS e.g. -msve-vector-bits because the 
> > > > > > > SVE
> > modes
> > > > > > > are fixed size then.  Secondly it'll have issues respecting 
> > > > > > > --param aarch64-
> > > > > > autovec-preference=
> > > > > > > as this is intended to only affect autovec where mode_for_vector 
> > > > > > > is
> > general.
> > > > > > >
> > > > > > > The core of this optimization is that it must change to Adv. SIMD 
> > > > > > > over SVE
> > > > > > modes.
> > > > > >
> > > > > > OK, so SVE VLS -msve-vector-bits=128 modes are indistinguishable 
> > > > > > from
> > Adv.
> > > > > > SIMD
> > > > > > modes by the middle-end?
> > > > >
> > > > > I believe so, the ACLE types have an annotation on them to lift some 
> > > > > of the
> > > > > restrictions but the modes are the same.
> > > > >
> > > > > > Is there a way to distinguish them, say, by cost
> > > > > > (target_reg_cost?)?  Since any use of a SVE reg will require a 
> > > > > > predicate reg?
> > > > > >
> > > > >
> > > > > We do have unpredicated SVE instructions, but yes costing could work.
> > > > > Essentially what we're trying to do is find the cheapest mode to 
> > > > > perform
> > > > > the operation on.
> > > > >
> > > > > This could work.. But how would we incorporate it into the costing? 
> > > > > Part of
> > > > > the problem is that to iterate over similar modes with the same 
> > > > > element size
> > > > > likely requires some target input no?  Or are you saying we should 
> > > > > only
> > > > > iterate over fixed size modes?
> > > > >
> > > > > Regards,
> > > > > Tamar
> > > > >
> > > > > > I think we miss a critical bit of information in the middle-end 
> > > > > > here and I'm
> > > > > > trying to see what piece of information that actually is.
> > "find_subvector_type"
> > > > > > doesn't seem to be it, it's maybe using that hidden bit of 
> > > > > > information for
> > > > > > one specific use-case but it would be far better to have a way for 
> > > > > > the target
> > > > > > to communicate the missing bit of information in a more generic way?
> > > > > > We can then wrap a "find_subvector_type" around that.
> > > >
> > > > So for this one sth like targetm.mode_requires_predication ()?  But
> > > > as Tamar says above this really depends on the operation.  But the
> > > > optabs do _not_ expose this requirement (we have non-.COND_ADD for
> > > > SVE modes), but you want to take advantage of this difference.
> > > > Can we access insn attributes from optab entries?  Could we add
> > > > some "standard" attribute noting that an insn requires a predicate?
> > > > But of course that likely depends on the alternative?
> > > >
> > >
> > > We'd likely also require the mask that would be used, because I think 
> > > otherwise
> > > targetm.mode_requires_predication would be a bit ambiguous for non-flag
> > setting
> > > instructions or instructions that don’t do cross lane operations.
> > >
> > > e.g. SVE has both COND_ADD and ADD. But the key here is that if we know 
> > > we'll
> > > access the bottom 64 or 128 bits we could use an Adv. SIMD ADD.
> > 
> > But SVE ADD still requires a predicate register (with all lanes enabled),
> > no?  That's the whole point of the optimization we're discussing?
> > I see the only problem with -msve-vector-bits=N where GET_MODE_SIZE
> > is no longer a POLY_INT - otherwise that would be the easy
> > way to identify Adv. SIMD vs. SVE and heuristically prefer
> > fixed-size modes in the vectorizer when possible (for small known
> > niter <= the fixed-size mode number of lanes).  But with
> > -msve-vector-bits=128 GET_MODE_SIZE for Adv. SIMD and SVE is equal(?),
> > so we need another way to distinguish.  Because even with
> > -msve-vector-bits=128 you need the predicate register appropriately
> > set up as I understand you are not altering the SVE HW config which
> > would be also possible(?), but I'm not sure that would make it
> > possible to have a predicate register less ADD instruction.
> > 
> > What SVE register taking machine instructions do not explicitly/implicitly
> > use one of the SVE predicate registers?
> 
> Many, ADD for instance is this 
> https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/ADD--vectors--unpredicated---Add-vectors--unpredicated--
> 
> And SVE2 added many more. GCC already takes advantage of this and drops
> predicates entirely when it can to avoid the dependency on the predicate pipe.
> 
> Those are actually different instructions not just aliases.


I see.  So this clearly is a feature on instructions then, not modes.
In fact it might be profitable to use unpredicated add to avoid
computing the loop mask for a specific element width completely even
when that would require more operation for a wide SVE implementation.

For the patch at hand I would suggest to re-post without a new target 
hook, ignoring the -msve-vector-bits complication for now and simply
key on GET_MODE_SIZE being POLY_INT, having a vectorizer local helper
like

tree
get_fixed_size_vectype (tree old_vectype, unsigned nlanes-upper-bound)

?

> BIC for instance 
> https://developer.arm.com/documentation/ddi0602/2025-03/SVE-Instructions/BIC--vectors--unpredicated---Bitwise-clear-vectors--unpredicated--
> 
> Regards,
> Tamar
> 
> > 
> > > Where the
> > > operation would be beneficial for longer VL cores where the Adv. SIMD 
> > > vector
> > > pipes are multiplexed on the SVE ones. Such as Neoverse-V1, but not 
> > > Neoverse-
> > V2.
> > >
> > > Without the predicate being considered (or niters) SVE would have to 
> > > return false
> > > for the hook.
> > >
> > > Which is why an attribute may be tricky.
> > >
> > > Regards,
> > > Tamar
> > >
> > > > Richard.
> > > >
> > > >
> > > > > > Thanks,
> > > > > > Richard.
> > > > > >
> > > > > > > > > > It doesn't really fit related_vector_mode I guess.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Yeah I don't think would work unless as Richard mentioned we 
> > > > > > > > > have
> > > > > > > > > an argument to indicate which SIMD class you want to end up 
> > > > > > > > > with.
> > > > > > > > >
> > > > > > > > > > I also wonder if we can take it as a given that SVE and neon
> > > > > > > > > > inter-operate efficiently for all implementations, without 
> > > > > > > > > > some
> > > > > > > > > > kind of "rewriting" penalty.  Like in the above example we
> > > > > > > > > > set NEON v31 and use it as source for a SVE store via z31.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > It's an architectural requirement that the register files 
> > > > > > > > > overlap.
> > > > > > > > > This is described section B1.2 (Registers in AArch64 
> > > > > > > > > execution state).
> > > > > > > > > Any core that ever implements non-overlapping register where 
> > > > > > > > > they
> > > > > > > > > would need a rewiring penalty has bigger problems to worry 
> > > > > > > > > about,
> > > > > > > > > as an example the architecture describes that writing to the 
> > > > > > > > > lower part
> > of
> > > > > > > > > a Adv. SIMD register will clear the top parts up to VL of the 
> > > > > > > > > SVE "view".
> > > > > > > > >
> > > > > > > > > Any such rewiring would mean that Adv. SIMD and Scalar FPR
> > instructions
> > > > > > > > > become useless due to the defined state of the larger views 
> > > > > > > > > on the
> > > > register.
> > > > > > > >
> > > > > > > > I'm aware of the architectural requirement - it's just that I 
> > > > > > > > could
> > > > > > > > think of the HW re-configuring itself for masked operations and 
> > > > > > > > thus
> > > > > > > > switching back and forth might incur some penalty.  If it is 
> > > > > > > > common
> > > > > > > > practice to mix SVE and NEON this way it's of course unlikely 
> > > > > > > > such
> > > > > > > > uarch would be sucessful.  But then, powering off mask & high 
> > > > > > > > vector
> > > > > > > > part handling logic when facing NEON might be a possibility.
> > > > > > > >
> > > > > > >
> > > > > > > Yeah the reason for this optimization has more to do with how the 
> > > > > > > vector
> > > > > > > pipes are split between Adv. SIMD and SVE. An easy one is say 
> > > > > > > reductions,
> > > > > > > the bigger VL the more expensive in-order reductions like addv 
> > > > > > > become.
> > > > > > > But Adv. SIMD reductions have a fixed cost, and if we know we 
> > > > > > > only need
> > to
> > > > > > > reduce the bottom N-bits it'll always beat SVE reductions.
> > > > > > >
> > > > > > > Others like MUL just have a higher throughput in Adv. SIMD vs SVE 
> > > > > > > on e.g.
> > VL
> > > > > > 256
> > > > > > > bit cores.  So it's not just the masking but vector length in 
> > > > > > > general.
> > > > > > >
> > > > > > > And the reason we don't pick Adv. SIMD for such loops is that SVE 
> > > > > > > allows
> > > > partial
> > > > > > masking,
> > > > > > > so for e.g. MUL it's ok for us to multiply with an unknown valued 
> > > > > > > lane
> > since
> > > > > > predication
> > > > > > > makes the usages of the result safe.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tamar
> > > > > > >
> > > > > > > > Richard.
> > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Tamar
> > > > > > > > >
> > > > > > > > > > Richard.
> > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Richard
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > --
> > > > > > > > > > Richard Biener <rguent...@suse.de>
> > > > > > > > > > SUSE Software Solutions Germany GmbH,
> > > > > > > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 
> > > > > > > > > > 36809, AG
> > > > > > > > Nuernberg)
> > > > > > > > >
> > > > > > > >
> > > > > > > > --
> > > > > > > > Richard Biener <rguent...@suse.de>
> > > > > > > > SUSE Software Solutions Germany GmbH,
> > > > > > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > > > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > > > > > Nuernberg)
> > > > >
> > > >
> > > > --
> > > > Richard Biener <rguent...@suse.de>
> > > > SUSE Software Solutions Germany GmbH,
> > > > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > > > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG
> > Nuernberg)
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH,
> > Frankenstrasse 146, 90461 Nuernberg, Germany;
> > GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)

RE: [PATCH] vect: Improve vectorization for small-trip-count loops using subvectors

Reply via email to