Hi,
Thank you all for an interesting and pleasant experience. I am very
grateful to Linaro for the opportunity to meet and work with such an
amazing group of people. I wish you all the best, and hope to meet you
again (at least online).
You can find me at i...@il.ibm.com or ira@gmail.com.
Ir
On 20 December 2011 02:03, Michael Hope wrote:
> Hi there. I've looked further into the intermittent
> gcc/testsuite/g++.dg/cdce3.C test failures. Taking Ira's
> vectoriser-only fix-pr51301-4.6 branch and comparing it with it's
> predecessor r106845:
> * cdce3.o itself is identical across compi
Hi,
After learning how to control MEM_ALIGN and, therefore, alignment
hints from the vectorizer, I was able to generate 64-bit hints (with
the help of Ramana's patches). I saw a 16% improvement on a benchmark
with stack variables, for which we now force alignment to 64 bits and
create alignment hi
Hi,
- fixed PR 51285
- continued looking at the alignment issue, ran Michael's script with
different options, tested Ramana's preliminary patch for vld1/vst1,
and my "don't peel for low loop bounds" patch
Ira
___
linaro-toolchain mailing list
linaro-t
On 7 December 2011 22:36, Andrew Stubbs wrote:
> Hi all,
Hi Andrew,
>
> I've copied all those who made commits to GCC 4.6 this month.
>
> Could you please give me a sentence or two for the release notes?
I committed several patches to enable SLP of libav's weight-h264-pixels16x16-8:
- support S
Hi,
- Ran eon with gcc 4.7: there are much more loops similar to the one
in lp#831094 that get vectorized (due to some data ref analysis
improvement), so the impact of disabling peeling for such loops (i.e.
loops with low loop bound) is even bigger than for 4.6, and
vectorization improves the perf
On 30 November 2011 22:28, Michael Hope wrote:
>>> This run also showed the affect of loop unrolling. The loop seems to
>>> be unrolled for loops of <= 64 words and drops off in performance past
>>> around 8 words. When the unrolling finally drops out, performance
>>> increases by 101 %.
>>
>> I
On 30 November 2011 02:33, Michael Hope wrote:
> I then converted the vld1 and vst1 to specifiy an alignment of 64
> bits. See:
> http://people.linaro.org/~michaelh/incoming/set-alignment.png
>
> This improved the throughput in all cases and in cases for more than 50
> words by 14 %. This graph
On 24 November 2011 15:32, Ira Rosen wrote:
> * Disabling peeling for low loop bounds also helps with one of EEMBC
> benchmarks, for which vectorization with double-words is more
> beneficial than with quad-words. It turns out that we are able to
> force the alignment for doubl
Hi,
* Worked on peeling problem in eon (#831094). Wrote a patch that
checks if the number of vector iterations is going to be more than 2,
and disables peeling otherwise. With this patch I see about 1.5%
regression with vectorization (and about 7% without it).
* I am thinking to extend the patch
Hi,
- spent most of the week trying to reproduce regressions with vectorization
- started bringing the latest SLP feature, condition with different
types, to gcc-linaro. There are 5 patches. Merged one, started to
prepare another one.
- fixed PR 51112
Ira
__
Hi,
- SLP improvements for weight-h264-pixels16x16-8 (libav):
- conditions in SLP - committed upstream
- support pattern detection in SLP - implemented
- enhance mixed condition pattern to handle non-constant then/else
clauses - implemented
weight-h264-pixels16x16-8 now gets vectorize
Hi,
- Finished rewriting SLP analysis to support not only unary and binary
operations. Committed upstream.
- Implemented cond_expr support in SLP (for libav weight_h264_pixels).
Testing it now.
- Vectorizer maintenance (test/bug fixes, patch reviews).
Ira
__
Hi,
- Merged to gcc-linaro:
- widening shifts
- SLP features: support loads with different offsets and swap
operands if necessary
- Started rewriting SLP analysis to support operations with more than
two operands (towards SLP of conditions)
- Updated NEON presentation following Ramana's sugg
Hi,
* widening shifts - finally committed upstream
* SLP loads with different offsets and operand swaps - committed upstream
* SLP with multiple types - merged to gcc-linaro-4.6
* vectorizer stuff: patch review, test fixes, discussions, bug fix
* Ramana and I discussed what can be done with VEC_PE
Hi,
* Finished a presentation for NEON forum. Revital and Richard kindly
agreed to take a look and gave me some valuable comments. Thanks!
* widen-shifts:
- While preparing the presentation I found some room for improvement
in the pattern detection, so I implemented it. It gave additional 13%
t
Hi,
- worked on the RTL part of the widen-shift patch
- backported to linaro 2/3 of the SLP patches, and proposed the third one
- worked on additional SLP improvements:
- swap operands to make statements isomorphic
- support load with offset 1 (after load from 0)
- started working on presentat
Hi,
* change default vector size patch - merged to linaro-gcc
* SLP improvements committed upstream:
- allow not affine accesses
- auto-detect vector size
- support multiple types and promotion/demotion operations
Ira
___
linaro-toolchain mailing
Hi,
* widening shifts patch - submitted upstream
* change default vector size patch - submitted to linaro-gcc
* automatic choice of vector size for basic block vectorization - testing
* vectorizer bug fixes
Next week we have New Year holiday on Wednesday (half day) and Thursday.
Ira
___
Hi,
* testing widen-shifts patch on ARM
* SLP improvements:
- submitted a patch to allow not simple ivs in SLP
- committed a patch to allow read-after-read dependencies in SLP
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.or
Hi,
* merged vector over-promotion patch to linaro-gcc-4.6
* committed upstream the change of the default vector size for NEON
* continued working on widening shifts
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.l
Hi,
- catching up with the mail
- fixed GCC PR50178
- prepared a patch to fix vectorizer dumps
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Hi,
- change of default vector size for auto-vectorization on NEON -
submitted and approved
- continued working on vectorization of widening shifts
- looked into SLP vectorization for libav
- two vacation days
I'll be on vacation on Aug 22-30.
Ira
___
On 18 August 2011 12:45, Andrew Stubbs wrote:
> On 18/08/11 06:56, Ira Rosen wrote:
>>>
>>> How can I tell the vectoriser that a input is a multiple of something?
>>
>> Unfortunately, I don't think you can.
>
> I think you can do something like this:
On 18 August 2011 02:43, Michael Hope wrote:
> On Thu, Aug 18, 2011 at 11:11 AM, Michael Hope
> wrote:
>> On Tue, Aug 16, 2011 at 11:32 PM, Richard Sandiford
>> wrote:
>>> Michael Hope writes:
I put a build harness around libav and gathered some profiling data. See:
bzr branch lp:~
Hi,
* fixed PR 50014 and 50039 - to be backported to linaro-gcc
* tested the patch to change the default vector size on NEON
* found one test that fails with quad-words -
gcc.c-torture/execute/mode-dependent-address.c. Debugging it with
Ramana.
* started looking into widening shifts
Vacation plan
Hi,
* committed upstream a patch that reduces over-promotion of vector operations
* started to work on a new version of the patch to change the default
vector size for Neon
* attended Linaro connect
Ira
___
linaro-toolchain mailing list
linaro-toolchai
On 5 August 2011 08:53, Richard Sandiford wrote:
> As well as using a more recent compiler, the new version also uses
> -mvectorize-with-neon-quad. Once again it shows a significant improvement
> over the default.
Richard E., Ramana and I finally came to an agreement about how it
should work, s
Hi,
I am checking the coverage of the NEON instructions mostly by writing
tests in C to check which instructions are generated (after
auto-vectorization) and which are not.
I put here https://wiki.linaro.org/IraRosen/Sandbox/InstructionCoverage
the list of things that I've checked till now.
Ira
Hi,
- I finally submitted the over-widening patch, but Richard Guenther
thought that this optimization should be done for scalars as well, and
he is now working on this himself.
- Some auto-vectorizer fixes
Ira
___
linaro-toolchain mailing list
linaro-
Hi,
- merged over-widened multiply patch to gcc-linaro-4.6 (now vectorized
rgbyiqv should be about as good as its scalar version)
- continued working on over-widened shifts and bit operations
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists
Hi,
- continued working on prevention of over-widening in vectorization -
finalizing the patch
- improvement of vectorizer peeling heuristic - merged to gcc-linaro-4.6
- vectorization of widen-mult with over-promoted operands - proposed
for merge to gcc-linaro-4.6
- fixed PR 49610
- patch reviews
Sorry, I forgot to mention that I will be on vacation on July 7-10.
Ira
On 30 June 2011 15:19, Ira Rosen wrote:
> Hi,
>
> - support of multiple uses of original pattern statements (needed for
> over-promotion work) - committed upstream
> - support of widen-mult of unsigned type
Hi,
- support of multiple uses of original pattern statements (needed for
over-promotion work) - committed upstream
- support of widen-mult of unsigned types and constants - merged to
gcc-linaro-4.6
- vectorizer peeling heuristic improvement - proposed to merge to gcc-linaro-4.6
Ira
Hi,
- investigating detection of general over-widening cases in the vectorizer
- improvements of widen-mult - proposed for merge to gcc-linaro-4.6
- fixed PRs 49443 and 49478
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http:
Hi,
- fix vectorizer testsuite failures on ARM - committed
- committed a fix of a bug in the vectorizer revealed by the widen-mult patch
- committed an improvement of peeling heuristic
- reduce over-widening in case of multiplication by a constant
(improves vectorized rgbyiq by almost 2x) - commit
Hi,
- vectorization of widening multiplication of unsigned types and
constants - committed to mainline
- fix vectorizer testsuite failures on ARM - submitted
- testing a patch to fix a bug in the vectorizer revealed by the
widen-mult patch
- testing a patch to fix bad peeling heuristic that causes
Hi,
- bug fixes: PRs 49222, 49199, 49239, 49093
- widening multiplication: submitted a patch to support widen-mul for
unsigned types and constants in the vectorizer's pattern recognizer.
Now considering to move optimize_widening_mul pass before loop
optimizations and improve it to support unsigned
Hi,
* PR 49087 - fixed
* PR 49038 - opened by Richard - fixed on 4.7, to be backported to 4.5 and 4.6
* working on widening multiplication for unsigned types and constants
(the signed case works fine)
Ira
___
linaro-toolchain mailing list
linaro-toolch
> At the moment good 'ol
> CoreMark is worse with -O3 -omfpu=neon...
It maybe worth to try -fvect-cost-model.
Ira
>
> -- Michael
>
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/l
Hi,
* committed a patch that supports reductions in SLP (upstream)
* continued analyzing benchmarks: ffmpeg, EEMBC telecom, office, networking
* started to look into implementation of reverse accesses for Neon
* blueprints
Ira
___
linaro-toolchain mail
Hi,
* continued looking into ffmpeg/libavcodec:
- dcadsp.c - the inner loop contains reverse accesses which are not
supported on Neon. I think we can handle them using vrev and vswp.
- a lot of loops have unknown memory stride. I am exploring a
possibility of a combination of scalar loads and
Hi,
- backported vzip fix to GCC 4.5 and 4.6 (PR 48252)
- merged auto-detection of vector size patch to gcc-linaro 4.6
- started looking into vectorization of ffmpeg
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.
Hi,
gcc-linaro/4.6
- if-conversion improvement (needed to vectorize Telecom/viterbi) - merged
- auto-detection of vector size - proposed for merging
GCC FSF
- fixed PR 48765 on trunk
- completed testig of vzip fix (PR 48252) on gcc 4.5 and gcc 4.6
Ira
___
On 13 April 2011 18:48, Richard Sandiford wrote:
> I've now submitted the initial vldN and vstN work, so I thought I'd see
> how often it triggers for natty's libav package. I've put some initial
> results here:
>
> https://wiki.linaro.org/RichardSandiford/Sandbox/NeonLibAv
>
> There are more
Hi,
* continued bringing patches upstream
- changing default vector size to 128 - resubmitted with changes
according to comments, awaiting review
- if-conversion improvement - committed
* PR 48252 - bug in vzip/vuzp/vtrn implementation - patch submitted
* opened PR 48454 - a test failure wit
Hi,
* continued bringing patches upstream
- auto-detection of vector size - committed
- changing default vector size to 128 - submitted and testing the
final version
- if-conversion improvement - submitted and now testing the final version
* gcc-linaro-4.6
- submitted a merge request for
Hi,
* resubmitted and committed store sink patch to trunk, I'll commit it
to gcc-linaro-4.6 next week
* submitted autodetection of vector size patch to gcc-patches, I'l
commit it next week
* started testing a patch that makes mvectorize-with-neon-quad the default
* DenBench: found some more cases
Hi,
* submitted store sinking patch to mainline
* started testing auto-detection of vector size patch
* DENBench - some benchmarks are still unstable, I am looking into
stable regressions, adjusting and fixing the cost model for them
Next week:
Sunday and Monday - holidays
Ira
_
On 15 March 2011 13:30, Richard Sandiford wrote:
> Ira Rosen writes:
> >> > How do you distinguish between "multiple structures" and "single
> > structure
> >> > to all lanes"?
> >>
> >> Sorry, I'm not sure I underst
Hi,
* continued working on cost model tuning. I don't see much difference
running EEMBC DenBench with and without vectorization enabled (and,
therefore, also with and without cost model).
Also, I have to say, that the results are not stable and I sometimes
get 10% difference just running the same
> [Sorry, forgot to CC: the list]
>
> Hi Ira,
>
> Thanks for the feedback.
>
> On 6 March 2011 09:20, Ira Rosen wrote:
> > > So how about the following functions? (Forgive the pascally syntax.)
> > >
> > > __builtin_load_lanes (REF : array N
er to allow these functions to be created in the
> original source code. This is throw-away code though; it would never
> be submitted.
>
> I've also included a simple test case and the output I get from it.
> The output looks pretty good; there's not even the stray VMOV tha
Hi,
* vectorizer cost model
- implemented builtin_vectorization_cost for NEON
- added register spilling considerations to the cost model
- started testing/tuning on EEMBC Telecom and DenBench (for now I
have only two examples for spilling: fdct_int32 mp4encode that
shouldn't get vectorized a
Hi,
This week I looked into DENBench:
* sad8_c (hot function from mp4encode) needs SLP reduction, but it
also contains cond_expr which cannot be vectorized as reduction, so I
don't think there is anything I can do here
* fdct_int32 (another hot function from mp4encode) now gets vectorized
with vzi
Hi,
* regtested vzip/vuzp patch
* looked into big-endian build
* applied all the required patches and checked that Viterbi gets
vectorized giving ~2x performance improvement (compiled with
cross-compiler)
* looked into vld/vst implementation - mostly discussions with Richard
* DenBench analysis:
On 8 February 2011 17:34, Julian Brown wrote:
> On Tue, 8 Feb 2011 11:22:32 +
> Julian Brown wrote:
>
>> IIRC I couldn't figure out the magic incantation needed to do it last
>> time I tried. I don't think the "--with-endian=xxx" option is
>> supported for ARM. Possibly the way to do it is to
On 7 February 2011 18:24, Julian Brown wrote:
> On Mon, 7 Feb 2011 17:18:40 +0200
> Ira Rosen wrote:
>
>> Hi,
>>
>> I'd like to check vzip/vuzp patch in big endian mode. But when I try
>> to compile with -mbig-endian flag, I get
>>
>> > ~/m
Hi,
I'd like to check vzip/vuzp patch in big endian mode. But when I try
to compile with -mbig-endian flag, I get
> ~/mainline/bin/bin/gcc -O3 -mfloat-abi=softfp -mfpu=neon neon-vtrnu8.c
> -mbig-endian
/home/irar/mainline/bin/lib/gcc/armv7l-unknown-linux-gnueabi/4.6.0/../../../libgcc_s.so.1:
cou
Hi,
I continued to work on vect_interleave and vect_extract implementation on NEON:
* debugged the compiler to find out what's the problem with
neon_vzip/vuzp_internal
* fixed it following Uli's advice
* checked how neon_vzip/vuzp_internal work for intrinsics by
writing tests
* fixed the patch
On 1 February 2011 16:23, Ulrich Weigand wrote:
>>
>> Are they actually broken ? I'd be worried if that were the case. My
>> understanding is that the
>> existing ones are being used for the Neon intrinsics / builtins.
>
> Yes, they're broken, for the reason Ira originally points out:
Right. Th
On 1 February 2011 11:47, Ira Rosen wrote:
> Thanks a lot! It seems to work. It fixed the problem and I am now
> testing the patch on the rest of the vectorizer testsuite.
After testing only with the vectorizer testsuite (which contains at
least 30 tests for strided accesses), I'd
On 31 January 2011 16:53, Ulrich Weigand wrote:
> Ira Rosen wrote:
>
>> (define_insn "neon_vzip_internal"
>> [(set (match_operand:VDQW 0 "s_register_operand" "=w")
>> (unspec:VDQW [(match_operand:VDQW 1 "s_register_oper
Hi,
I am trying to implement interleave_high/low and extract_even/odd
using vzip and vuzp instructions. I am attaching a patch that attempts
to do that. It uses already existing neon_vzip_internal. The
problem with it is that it doesn't express the fact that the two
outputs of vzip depend on both
gt;
I am planning to post the patch here anyway, but since there are some
test failures I prefer to wait with this a bit (and your questions
make me feel even more insecure with the patch ;)).
Ira
> Thanks,
> Tejas.
>
>
>
> On Thu, 2011-01-27 at 15:44 +0200, Ira Rosen wrote:
&g
Hi,
I am working on implementation of interleave_high/low and
extract_even/odd for NEON. The pairs of high/low (even/odd) are
"magically" united into single vzip (vuzp) instruction in the back
end, so there is no need in special support from the tree level. There
are still some test failures that
Hi,
* finished SLP for reduction patch. The loop in DenBench that needs
this feature also requires support of load permutation. I am
considering to implement that too. I looked for other occasions that
need this feature, but only found loops that are not vectorizable. So,
I am not sure I'll procee
Hi,
* Continued with testing and implementation of reduction support in SLP
* Found a major problem in vectorization of if-converted data
accesses. Looked into other ways to solve the problem.
* Spent some time on non-Linaro vectorization plans
* Unsuccessfully tried to make the board work
Ira
_
Hi,
* implemented reduction support in SLP, I'll check if it helps
DenBench next week
* helping Sebastian Pop with if-conversion for vectorization
improvements (BTW, Sebastian's goal is to vectorize kernels from
ffmpeg)
* fixed GCC PR47139
Ira
___
lina
Hi,
* continued with my attempts to vectorize Viterbi:
- finished implementation of conditional store sinking in cselim
pass (I did only limited testing).
- reconsidered the idea of safe load if-conversion if an adjacent
field of the same structure is accessed unconditionally - this may be
i
Hi,
I was on vacation on Sunday and starting from Tuesday stayed home with
a sick child, so I only had a couple of days to work.
* vectorization of Viterbi:
- continued implementing conditional store sinking in cselim pass
- made if-conversion to work on loads of structure fields if other
fie
Hi Ramana,
On 16 December 2010 20:11, Ramana Radhakrishnan
wrote:
> Hi Ira,
>
> On Thu, 2010-12-16 at 15:29 +0200, Ira Rosen wrote:
>> - telecom viterbi (vectorization potential gain is 4x) requires
>> conditional store sinking and load hoisting to enable if-conve
Hi,
I continued looking into EEMBC benchmarks:
- telecom fft is not vectorized because unknown number of iterations.
It has both non-constant step and its loop bound may overflow. I
think, the solution here could be loop versioning, but since
versioning increases code size, this kind of optimizati
- Continued looking into NEON special loads and stores.
- Benchmarks: concentrated on EEMBC Telecom:
- autcor gets vectorized
- viterbi, besides strided data accesses, needs to sink conditional
stores to allow if-conversion and make the main loop vectorizable.
Since the potential here is 4x,
On 1 December 2010 17:57, Daniel Jacobowitz wrote:
> On Wed, Dec 01, 2010 at 11:16:16AM +0200, Ira Rosen wrote:
>> The meaning of the builtin (or maybe a new tree code would be better?)
>> is that the elements of v0, v1 and v2 are deinterleaved. I wanted the
>> MEM_REFs, s
On 30 November 2010 14:51, Julian Brown wrote:
>> >>> I think we need to somehow enhance MEM_REF, or maybe generate a
>> >>> MEM_REF for the first vector and a builtin after it.
>> >>
>> >> Yeah, keeping these things looking like memory references to most
>> >> of the compiler seems like a good p
On 25 November 2010 22:34, Michael Hope wrote:
> On Fri, Nov 26, 2010 at 2:35 AM, Ira Rosen wrote:
>> FFMPEG http://www.ffmpeg.org/ (got this from Rony Nandy from
>> User Platforms). It contains hand-vectorized code for NEON.
>> Investigating.
>
> I'm builid
On 22 November 2010 13:46, Ira Rosen wrote:
> On 17 November 2010 13:21, Julian Brown wrote:
>>> > We'd need to figure out what the RTL for such loads/stores should
>>> > look like, and whether it can represent alignment constraints, or
>>> > strides,
Hi,
- the struggle with the board took a lot of time
- continued to investigate special loads/stores
- looked for benchmarks:
EEMBC Consumer filters rgbcmy and rgbyiq should be vectorizable
once vld3, vst3/4 are supported
EEMBC Telecom viterbi is supposed to give 4x on NEON once
vector
On 17 November 2010 13:21, Julian Brown wrote:
>> > We'd need to figure out what the RTL for such loads/stores should
>> > look like, and whether it can represent alignment constraints, or
>> > strides, or loads/stores of multiple vector registers simulateously.
Alignment info is kept in struct p
Hi,
This week I continued looking into vld/vst support in GCC.
I also fixed GCC PR 46312 - testsuite failures on ARM.
Ira
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Hi,
On 17 November 2010 05:35, Michael Hope wrote:
> 1. How easy is it to frequently merge in SVN? It used to be terrible
> as you had to manually track the merges. These days can you do a 'svn
> merge trunk' and have it just work?
I asked Mike Meissner to answer this question. Mike is very ex
On 15 November 2010 17:33, Julian Brown wrote:
> On Mon, 15 Nov 2010 10:12:26 +0200
> Ira Rosen wrote:
>
> > Hi Julian,
> >
> > On 12 November 2010 17:49, Julian Brown
> > wrote:
> > > ...
> > > The important observation is that vectors from c
Hi Julian,
On 12 November 2010 17:49, Julian Brown wrote:
>
> For the first of these, I think we can get away with changing the
> vectorizer to use explicit "array" loads and stores (i.e. vldN/vstN), so
> that vector registers will hold elements in memory order -- so, all the
> contortions in th
Hi,
I started to look into mixed vector sizes (in the same loop). My main reason
for this was to allow widening and narrowing instructions, that have
different vector sizes for src and dest, to work properly. My example was
widen_mult (int = short * short), I thought its implementation was not
opt
On 9 November 2010 15:36, Andrew Stubbs wrote:
> On 09/11/10 12:55, Ira Rosen wrote:
>
>> * We can't really apply anything we want just for ourselves
>>
>> Why? It will be our "private" Linaro branch. We can apply whatever we
>> want there (we ca
On 9 November 2010 14:38, Andrew Stubbs wrote:
> Re my recent email "Upstream GCC feature freeze", I think we're agreed that
> we need to create a branch that tracks GCC 4.6 development, but has our own
> performance improvements included. The question is where to host it?
>
> Option 1: Launchpad
Julian Brown wrote on 05/11/2010 12:58:14 PM:
> I think it's probably fine to default to 128-bit vectors, and fall back
> to 64-bits when necessary (where access patterns block usage of wider
> vectors, or similar). AIUI, ARM were quite keen to get rid of
> -mvectorize-with-neon-quad altogether
On 8 November 2010 20:30, Chung-Lin Tang wrote:
> Still, I would like to see a 'linaro-trunk' branch under svn://
> gcc.gnu.org/svn/branches. It would actually serve a different purpose than
> a LP branch; the LP GCC 4.6 would probably eventually turn into Linaro 4.6,
> while a SVN branch would b
Julian Brown wrote on 03/11/2010 11:55:59 AM:
>
> On Mon, 1 Nov 2010 15:57:11 +0200
> Ira Rosen wrote:
>
> > It looks like it's enough to implement targetm.vectorize.
> > autovectorize_vector_sizes for NEON in order to enable initial
> > auto-detection of v
Hi,
It looks like it's enough to implement targetm.vectorize.
autovectorize_vector_sizes for NEON in order to enable initial
auto-detection of vector size. With the attached patch and
-mvectorize-with-neon-quad flag, the vectorizer first tries to vectorize
for 128 bit, and if this fails, it tries
Joseph Myers wrote on 14/10/2010 05:18:37 PM:
> On Thu, 14 Oct 2010, Ira Rosen wrote:
>
> > Let me check that I understand the problem first: the problem is that
VLD1
> > and VST1 instructions in big endian mode follow the array numbering of
> > elements, while all o
Julian Brown wrote on 11/10/2010 04:29:15 PM:
> In further followups (at the risk of misrepresenting Joseph & Paul
> Brook's opinions!), there seemed to be general agreement that a scheme
> something like that outlined below, with "permuting" loads/stores and
> some way of handling multiple in-re
Hi Julian,
Here are some thoughts about your report.
> Automatic vector size selection/mixed-size vectors
> ==
I think we (I) need to cooperate with Richard Guenther: ask him about
committing his patch to 4.6 (they are probably planning to merge v
nd ARM®v7-R edition".
Ira
>
> Cheers,
>
> Julian[attachment "CS308-vectorization-improvements.txt" deleted by
> Ira Rosen/Haifa/IBM]
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain
95 matches
Mail list logo